I write two codes,for simplicity, they are named “MPI.c” and “CUDA.cu” here, and a header file, “cufunction.h” including the function name in Cuda.cu. The layouts are below.
////////////////////////
MPI.c #include"cufunction.h"
main()
{
cudaSetDevice(rank);
cufunction1();
cufunction2();
}
////////////////////////
When I call cufunction1 in MPI main function, the host_temp1 comes right, it is equal the value of host.
But when I call cufunction 2 in MPI main function, the host_temp2 comes wrong and it is the initialized value of 0. I think the cudaMemCpy does not work in cufunction2 but there is not anything when running the program.
I hope someone could give me some tips on this one and I will be really grateful!!!
If you allocate for device using cudaMalloc in cufunction1, there is a reasonable chance you are passing that device pointer value incorrectly to cufunction2. Its a fairly common type of error. But its impossible to say based on the “pseudocode” you have shown.
It’s always good practice to use proper cuda error checking. statements of the form “there is not anything when running the program.” when the code has no indication of proper cuda error checking are immediately suspect.
Actually, all the pointers and arguments are defined in the header files and they are included in the MPi.c.
I have used nvprof to watch the running progress, the cudamemcpy in cufunction 2 is called but the elapse time is shorter than the one in cufunction1, I will take your advice and try the cuda error check.
There is a button in the top bar to indent your code as code, please use it!
Why are you using extern “C”?
Without declarations of your variables it is not possible to reproduce the error. I recommend you to use a macro which checks returning results of cuda API functions. Something like this:
#define CUDA_CHECK(ans) { __cudaCheckError((ans), __FILE__, __LINE__); }
inline void __cudaCheckError(cudaError_t code, const char *file, int line, bool abort = true)
{
cudaError sync = cudaDeviceSynchronize();
// cudaSuccess async = cudaDeviceASynchronize();
if(sync != cudaSuccess)
{
if (code != cudaSuccess)
{
printf("Synchronous Cuda Error: %s %s %d\n",
cudaGetErrorString(code),
file,
line);
}
}
if (code != cudaSuccess)
{
printf("Cuda Error: %s %s %d\n",
cudaGetErrorString(code),
file,
line);
}
}
...
// If you call a function from the CUDA API use the macro like this
CUDA_CHECK(cudaMemcpy(dst, src, size, cudaMemcpyKind));