Retrieving data from device to host memory while computer is rendering OpenGL graphics through the s

I have two separate applications using the object defined by the same custom class that I have written to encapsulate the CUDA and CUBLAS function calls and structures. When I ran the following code snippet in a console application, I can access the device pointer with no problems.

theCudaObj->CopyDispFromGPU(55, thu);

theCudaObj->CopyForcesToGPU(55, &f[55]);

with

int CudaObj::CopyForcesToGPU( int node, const void *src, size_t count /*= 3*sizeof(*forces)*/ )

{

	// check if exceeded dimensions

	if (node + count > dim*sizeof(*forces)) return -1;

	state = cudaMemcpy(&forces[node], src, count, cudaMemcpyHostToDevice);

	if(state != CUDA_SUCCESS) {

		printf("Error copy to video memory.\n");

		return -1;

	}

	return 0;

}

int CudaObj::CopyDispFromGPU( int node, void *dst, size_t count /*= 3*sizeof(float)*/ )

{

	// check to see if memory are within bounds

	if(node + count > dim*sizeof(*forces)) return -1;

	state = cudaMemcpy(dst, (const void*) &disp[node], count, cudaMemcpyDeviceToHost);

	if(state != CUDA_SUCCESS) {

		printf("Error copy to video memory.\n");

		return -1;

	}

	return 0;

}

When I call these functions in a MFC application with view windows rendering graphics using OpenGL, cudaMemcpy() function cannot succeed when being called in similar manner. My guess is the graphics rendering of the application is blocking the transfer between device and host. Would turning off OpenGL be helpful in resolving this issue (I’m in the progress of figuring out how to easily turn off the OpenGL rendering when the computer needs to access the GPU for CUDA operations)? What would be other alternatives to get around this problem? Thanks in advance for your advices.

There shouldn’t be any problem using OpenGL and Cuda in the same application (many of our samples do this). Are you sure you’re calling cudaMemcpy from the same thread that holds the Cuda context?

Thanks for your comments about the threads. I have realized that I have indeed run into some issues with managing the threads of my application to work with CUDA. Below is what I have originally done with construction and destruction of the threads/objects:

Construction:

Destruction:

I have created the CUDA context strictly speaking in thread #1 while constructing thread #2, leading to the error when I tried to copy the memory contents. I have tried to put the creation of the CUDA context in the very first time thread #2’s main function is resumed as follows:

int InsertionSim::main()

{

#if USECUDA

	gpu_vars = new CudaObj(this);

#endif

	while (1) {

		if(theApp->activeUserControl->GetSimMode() == USE_TRAJ ||

		   theApp->activeUserControl->GetSimMode() == MOUSE_ONLY)

				theApp->dataReady->Wait();

		if (TerminateStatus())

			break;

		try

		{

			// irrelevant code

		}

		catch (int e)

		{

			// error handling code

		}

		if(theApp->activeUserControl->GetSimMode() == USE_TRAJ ||

			theApp->activeUserControl->GetSimMode() == MOUSE_ONLY)

			// this if statement is added to skip signaling the parent thread

			// when it is already signaled in between switching from

			// haptics to non-haptics mode

			if (!theApp->dataProcessed->Read())

				theApp->dataProcessed->Signal();

	}

#if USECUDA

	delete gpu_vars;		gpu_vars = NULL;

#endif

	return 0;

}

However, if I do it this way, I run into errors exiting the application in the line of CudaObj’s destructor where cudaFree() is called. This causes the memory cleanup code after CUDA context’s cleanup code not being executed due to the error, leaving behind a mess of memory leaks.

If I instantiate the CUDA context creation in the main function as shown in my example, but leaving the destruction in the destructor of InsertionSim class being called by thread #1’s destructor, I would run into problems when trying to free the CUDA context memory created in thread #2’s main function. Knowing this information, what would be your recommendations? Thanks in advance.