I encountered a problem with mixed programming with OpenACC and CUDA, and the output was inconsistent

Hi,Mat.I’m having issues with OpenACC mixed programming with CUDA.Here’s the code:
First of all, I read the filter data from the file and did FFT, and the CPU and GPU space has been opened up in advance.

Readdata("b_200_10k.txt",bandpass_b,Element->J_bandpass+1);                       
#pragma acc update device(bandpass_b[0:Element->J_bandpass+1])
ManyFFT_D2C(plan_b_bandpass,bandpass_b,bandpass_b_fft);

It is then called in the function;

#pragma acc kernels present(b_fft[0:Fs],input_fft[0:M_Array*Fs],output_fft[0:M_Array*Fs])    
#pragma acc loop independent collapse(2)                     
for(int i=0;i<M_Array;i++)
{
  for(int j=0;j<Fs;j++)
    {
        output_fft[i*Fs+j]=cuCmulf(input_fft[i*Fs+j],b_fft[j]);          
    }
}
cudaDeviceSynchronize();                                                
#pragma acc update self(outinput_fftput_fft,input_fft,b_fft)
for(int i=100;i<500;i++)
{
cout<<b_fft[i].x<<"--"<<b_fft[i].y<<input_fft[i].x<<"--"<<input_fft[i].y<<endl;
}

The compiled code is:

nvc++ -acc -gpu=managed -cuda -cudalib=cufft,cublas -std=c++17 -Minfo=accel  -lcudart -fPIC -fast -Xcomplier   -shared -o libprocess.so libprocess.cpp

Each time the above code is executed,b_fft is sometimes correct, sometimes it is NAN, INF or some other wrong number.
My platform information is as follows:

Sun Jan 21 23:02:20 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 … Off | 00000000:01:00.0 On | N/A |
| N/A 43C P8 16W / 80W | 484MiB / 6144MiB | 4% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1255 G /usr/lib/xorg/Xorg 214MiB |
| 0 N/A N/A 1551 G /usr/bin/gnome-shell 101MiB |
| 0 N/A N/A 3414 G …irefox/2987/usr/lib/firefox/firefox 121MiB |
| 0 N/A N/A 10049 G …sion,SpareRendererForSitePerProcess 37MiB |
±--------------------------------------------------------------------------------------+

Unfortunately there’s not enough information here to determine the cause of the inconsistent results. Are you able to post or send me a reproducing example so I can investigate?

Your “update self” directive is missing the array bounds information, but given you’re using managed memory, I doubt this is the problem. But just in case, try adding the bounds info.

#pragma acc update self(outinput_fftput_fft[:SIZE],input_fft[:SIZE],b_fft[:SIZE])

Note replace “SIZE” with the actual array sizes.