Host to device memory zero copying CUDA pointer couldn't see host memory defined in Fortran

Hi,

I was trying to learn the CUDA feature of zero copying from host to device memory.

My main program is in Fortran, and I initialized a pointer (global variable to all C functions) in C with host mapped memory allocated. In Fortran, the pointer (technically, the array declared in Fortran) was assigned values. In the subsequent C functions, I tried to use cudaHostGetDevicePointer to map the host pointer to the device. However, it looks like CUDA doesn’t like this way, because cudaHostGetDevicePointer couldn’t work correctly (with error code 11 returned).

I think it is understandable as the same memory space was also declared by the Fortran code, which might be a problem. So, it seems in order to do the memory mapping, I have to make all my C pointers blind to Fortran code. I have tried that, and it worked properly.

Just curious about if there is any smart way that can somehow cheat either the Fortran or the C code, so that the CUDA device pointer can directly map the array declared in Fortran. Thanks!

Yu

I wrote an example some time ago on how to use pinned memory in Fortran:

http://developer.download.nvidia.com/compu…n_Cuda_Blas.tgz

You need to use iso_c_binding to map C pointers to Fortran arrays.

res = cudaMallocHost ( cptr_A, m1m2sizeof(fp_kind) )
call c_f_pointer ( cptr_A, A, (/ m1, m2 /) )

After this call, you could use A as a normal Fortran array.

If you have a big Fortran project, you should take a look at CUDA Fortran, it is really nice and clean.

Thanks for the replay!

It looks like a good solution and I will make a try.

Hi,

I got a chance to rewrite the program using iso_c_binding, but still get error message when using pinned memory.

I am not quite sure if I made some thing wrong here

In my Fortran code, initially I call a C function

int initmap_()

{

		cudaDeviceProp prop;

		cudaGetDeviceProperties(&prop, 0);

		if (!prop.canMapHostMemory) exit(0);

		cudaSetDeviceFlags(cudaDeviceMapHost);

		return 1;

}

to initialize the device for pinned memory operation

Then mapping Fortran arrays A (input) and C (output) to C arrays cptr_A and cptr_C with N double precision float

res = cudaMallocHost(cptr_A,N*sizeof(fp_kind))

		call c_f_pointer(cptr_A,A,(/N/))

		res = cudaMallocHost(cptr_C,N*sizeof(fp_kind))

		call c_f_pointer(cptr_C,C,(/N/))

Then array A is assigned values in Fortran and arrays A and C passed to C host function as arguments

In the C host Function, I have

double *d_a,*d_c;

		statusa=cudaHostGetDevicePointer((void **)&d_a, (void *)a, 0);

		statusc=cudaHostGetDevicePointer((void **)&d_c, (void *)c, 0);

		if (statusa != 0 || statusc !=0) {

				printf("Error when locating memory to arrays on device!\n");

				printf("%d,%d\n",statusa,statusc);

				return EXIT_FAILURE;

		}

But this returns error code 11.

Any further suggestion? Thanks!

From a quick scan of your code, you are using cudaMallocHost. You should use cudaHostMalloc to enable zero copy.

If you put together a self-contained code and a makefile, I will take a look.

Thanks a lot!

I probably will not be a able to access the code and the video card this week. But will make another try once I have a chance.

Hi,

After I replaced cudaMallocHost with cudaHostAlloc, the code showed segmentation error when initializing memory allocation.

I am attaching the code and a makefile (bash shell commands). I very much appreciate if you can take a look at your convenience.

The code is a quite simple test, I just wanted to try to parallelize exponential functions. In order to reduce operation time on memory, I wanted to try pinned memory. Thanks very much!

Yu

Fortran_CUDA_Exp_Cbinding.zip (10.3 KB)

This is a working example on how to use zero copy from Fortran.

More details at:
http://cudamusing.blogspot.com/2010/07/usi…om-fortran.html
ZeroCopyFortran.tar (8 KB)

Great, Thanks so much!

:lol: