Is pinned memory possible in mixed cpp and cuda

All

Can I allocate pinned memory with a .cpp file and access it within a .cu file? I am using VS2005 as my compiler for cpp, cuda 2.1, windows xp 32 bit, 9800GT. Look at the following code … If I use the commented allocation in the cpp file, everything is great. But, if I try to use cudaMallocHost, the cufft fails. Any ideas?

// “test.cpp”
extern “C” void GPU_2DFFTa(const int argc, const char** argv,
complex* data, unsigned int n1, unsigned int n2);
int _tmain(int argc, _TCHAR* argv)
{


long N1 = 1024;
long N2 = N1;

	complex<float> *xa;
	cutilSafeCall(cudaMallocHost((void**)&xa, N1 * N2 * sizeof(complex<float>)));

	//complex<float> *xa = new complex<float>[N1*N2];
	//for(int is = 0; is < (N1*N2); is++)
	//	xa[is] = complex<float>(is,is);

           GPU_2DFFTa(argc, (const char**)argv, xa, N1, N2);
}

// “test1.cu”
extern “C” void
GPU_2DFFTa(const int argc, const char** argv, cuComplex* data, unsigned int n1, unsigned int n2)
{

// allocate device memory
cuComplex d_data;
cutilSafeCall(cudaMalloc((void
*) &d_data, mem_size));

// copy host memory to device
cutilSafeCall(cudaMemcpy(d_data, data, mem_size,
                        cudaMemcpyHostToDevice) );

// FFT
// CUFFT plan
cufftHandle plan;
cufftSafeCall(cufftPlan2d(&plan, n1, n2, CUFFT_C2C));

// Transform signal and kernel (IN PLACE)
cufftSafeCall(cufftExecC2C(plan, (cufftComplex *)d_data, (cufftComplex *)d_data, CUFFT_FORWARD));
}

Very strange. I’m definitely using cudaMallocHost in my C++ code and passing the pointers to .cu files without any problems.

The only potential issue I’m aware of that can creep up is

  1. you need to make sure and use cudaMallocHost in the same thread as the one calling the .cu file functions to get the pinned memory performance

  2. you need to make sure and use cudaMallocHost in the same thread as the one calling any cudaMemcpyAsync on that memory

hmm, can you see anything wrong with the code snip above? I am not even launching my own threads, I am using cufft. Can you provide an example code of what works for you?

Sorry, I don’t see anything obviously wrong. Do you get a segfault or other useful error when running in debug mode?