cuFFTXt change descriptor XT_FORMAT manually

dejvbayer · January 22, 2023, 8:59am

Hello,

I use cuFFT library with multiple gpus (Xt) to accelerate my programme. I’ve run into some issues. Unfortunatelly I cannot see any good solution. The programme simulates wave propagation in a loop that uses R2C and C2R transfomations.

First I create R2C and C2R cufftXt plans. Then I create 3 cudaLibXt descriptors using cufftXtMalloc:

X with format INPLACE (will be initialized with data),
Y with format INPLACE_SHUFFLED and
Z with format INPLACE_SHUFFLED

After I copy in the initial data to X using cufftXtCopy, I begin the main loop.

First a R2C transformation is performed over X. Then a kernel that computes over complex data in X is launched and it computes new values for X, Y and Z based on X. Now I perform a C2R transfomation over X, Y and Z and use the data from all descriptors.

The problem is that after each iteration I end up with X in INPLACE format (that is ok) but Y and Z are in INPLACE format as well, but I need them in INPLACE_SHUFFLED format for next iteration. I dont need to perform a R2C transformation over them to put them in that format. I would like to just change their current state, I know all their data would be invalid then, but I would overwrite them anyway.

The solution that came into my mind is that I could allocate a new descriptor for Y and Z every iteration using INPLACE_SHUFFLED format, but that is not a good enough solution for me.

So I would like to ask if there is an other way, or if it could be possibly created, like some cufftXtSetDescriptorState function.

Thank you very much.

Dave

dejvbayer · January 23, 2023, 7:20am

Hi, I think I have solved the problem. It seems that the current data distribution is only tracked in member libFormat of struct cudaLibXtDesc. It flips from between INPLACE (real) and INPLACE_SHUFFLED (complex) after each transformation. If you manually change it to desired format, it will work.

WARNING: It only overwrites the format. It does not do the data exchange among GPUs!

Hopefully it will not change.

Thanks.

Dave

aaryan_kaushik · October 15, 2024, 12:06pm

Hi,
Do you know if there is a way to exchange the data?
My problem is also similar.

Here is the simplified version of my code:

// A simplified version of the kernel
__global__ void modify_arr(desc_data, arr) {
    idx = threadIndex;
    arr[idx] = desc_data[idx] * A + B; // A and B are constants
}

cufftXtSubFormat_t FORMAT = CUFFT_XT_FORMAT_INPLACE;
cufftHandle plan;
// ... Initialize 2D plan
cudaLibXtDesc* desc;
// ... Allocate desc on GPUs using cufftXtMalloc with given FORMAT

double *arr[num_gpus];
for (gpu_id = 0; gpu_id < num_gpus; gpu_id++) {
    cudaSetDevice(gpu_id);
    int numElements = desc->descriptor->size[gpu_id] / sizeof(double);
    // Allocate numElements to arr[gpu_id] on this GPU
    cudaMalloc(&arr[gpu_id], numElements * sizeof(double));
}

// Perform 2D FFT
cufftXtExecDescriptor(plan, desc, desc, CUFFT_FORWARD);

// Modify arr using the data of `desc` on the same GPU
for (gpu_id = 0; gpu_id < num_gpus; gpu_id++) {
    cudaSetDevice(gpu_id);
    modify_arr<<<...>>>(desc->descriptor->data[gpu_id], arr[gpu_id]);
}

Here, I want to modify the idx value of arr using the idx value of the data of desc descriptor. But after FFT, the data is exchanged between the GPUs and the original order is lost.
Specifically, is there any way to perform (2D/3D) FFT on multiple GPUs so that if x[i] was present on GPU g, then y[i] (y represents the result of FFT) will also be present on the same GPU at the same location.

Thank you very much
Aaryan

dejvbayer · October 16, 2024, 8:16am

Hi,

I wrote a function performing the reorder from CUFFT_XT_FORMAT_INPLACE_SHUFFLED to CUFFT_XT_FORMAT_INPLACE after R2C transform using cudaMemcpy3DPeer for 3D case. I didn’t test it, there may be a bug, but it can give you the idea what to do. You can implement the 2D case the same way, just work with X and Y axes and set Z size to 1.

#include <cstddef>
#include <algorithm>

#include <cuda_runtime.h>
#include <cufftXt.h>

void shuffledToNatural3D(void* const*       src,
                         void* const*       dst,
                         const int*         devices,
                         const std::size_t  deviceCount,
                         const std::size_t* n)
{
  const std::size_t z    = n[0];      // z dimension size
  const std::size_t y    = n[1];      // y dimension size
  const std::size_t x    = n[2];      // x dimension size
  const std::size_t xRed = x / 2 + 1; // reduced x dimension size

  for (std::size_t i = 0uz; i < deviceCount; ++i)
  {
    // devices[i]'s local y dimension size
    const std::size_t locSrcY = y / deviceCount + (i < y % deviceCount);

    for (std::size_t j = 0uz; j < deviceCount; ++j)
    {
      // devices[j]'s local z dimension size
      const std::size_t locDstZ = z / deviceCount + (j < z % deviceCount); 

      // z position from where read the slab
      const std::size_t srcZPos = j * (z / deviceCount) + std::min(j, z % deviceCount);
      
      // y position where to write the slab
      const std::size_t dstYPos = j * (y / deviceCount) + std::min(j, y % deviceCount);
      
      cudaMemcpy3DPeerParms params{};

      params.srcPos       = make_cudaPos(0, 0, srcZPos);
      params.srcPtr.ptr   = src[i];
      params.srcPtr.pitch = xRed * sizeof(cufftComplex);
      params.srcPtr.xsize = xRed;
      params.srcPtr.ysize = locSrcY;
      params.srcDevice    = devices[i];

      params.dstPos       = make_cudaPos(0, dstYPos, 0);
      params.dstPtr.ptr   = dst[j];
      params.dstPtr.pitch = xRed * sizeof(cufftComplex);
      params.dstPtr.xsize = xRed;
      params.dstPtr.ysize = y;
      params.dstDevice    = devices[j];

      params.extent = make_cudaExtent(xRed * sizeof(cufftComplex), locSrcY, locDstZ);

      if (cudaMemcpy3DPeer(&params) != cudaSuccess)
      {
        // handle error
      };
    }
  }
}

Best regards
David

aaryan_kaushik · October 17, 2024, 7:24pm

Hi

Can this code be used for shuffling the data in a cudaLibXtDesc descriptor? What will be the value of src and dst in that case?

Thank you very much
Aaryan

Topic		Replies	Views
cufftXtLibDesc fields and cuFFT_INTERNAL_ERROR from struct GPU-Accelerated Libraries cufft	2	16	March 29, 2025
CUFFT_INVALID_TYPE no longer in use? GPU-Accelerated Libraries cufft	0	494	February 17, 2023
CuFFT R2C 2D Batch Transforms Producing Incorrect Results GPU-Accelerated Libraries	1	2695	June 6, 2013
The multi-gpu fft 3D R2C problem GPU-Accelerated Libraries	5	659	September 9, 2019
Questions about cuFFT for 3D matrix, arrayFire GPU-Accelerated Libraries	5	1660	October 12, 2021
cufftComplex Data Issues GPU-Accelerated Libraries	3	1143	June 29, 2017
Cufft_R2C and Cufft_C2R are inaccurate GPU-Accelerated Libraries	2	1730	April 11, 2014
CUFFT bug in Cuda 4.0 Release Candidate 2 CUDA Programming and Performance	8	1637	May 5, 2011
cuFFT cufftPlan1d and cufftExecR2C issues GPU-Accelerated Libraries	4	2377	July 13, 2016
Fortran and cuFFT CUDA Programming and Performance	8	17837	September 19, 2009

cuFFTXt change descriptor XT_FORMAT manually

Related topics