Reporting a problem with CUDA memory access in multiple OS threads

Platform specification: WinXP, VS .net 2003, CUDA SDK 0.81, Core Duo 2 E6600 (2.4G/4M/1066) + 2G DDR 667 (1G*2) + NVIDIA G80.

I have tried to allocate memory (cudaMalloc()) in one OS thread, and access to it in another thread, but always fails.

The sample code is as the following. Who can try such code and tell me your results? Is there some suggestions on how to solve this problem? thank you in advance.

#define _MT //for thread related APIs.

#include <windows.h>
#include <process.h>

float* h_idata;
float* d_idata;

unsigned int __stdcall test_func(void* pArguments)
{
//both memory copy calls fail…
CUDA_SAFE_CALL( cudaMemcpy( d_idata, h_idata, 1024,cudaMemcpyHostToDevice) );
CUDA_SAFE_CALL( cudaMemcpy( h_idata, d_idata, 1024,cudaMemcpyDeviceToHost) );

_endthreadex(0);
return 0;
}

int main( int argc, char** argv)
{
h_idata = (float*) malloc( 1024);
CUDA_SAFE_CALL( cudaMalloc( (void**) &d_idata, 1024));

unsigned threadID;
HANDLE hThread = (HANDLE)_beginthreadex( NULL, 0, &test_func, NULL, 0, &threadID);

WaitForSingleObject( hThread, INFINITE );
CloseHandle( hThread );

}

The run-time message is as follows when I debug to the cudaMemcpy() line:
First-chance exception at 0x7c812a5b in template.exe: Microsoft C++ exception: bool @ 0x03a2ff0f.
First-chance exception at 0x7c812a5b in template.exe: Microsoft C++ exception: [rethrow] @ 0x000000000.
Unhandled exception at 0x7c812a5b in template.exe: Microsoft C++ exception: bool @ 0x03a2ff0f.

Note that the execution works well if cudaMalloc call is directly before cudaMemcpy() call (i.e. in the same thread).

What’s failed message did you get?

the description is much more clarified in the original text. Thanks.

Yes you can’t do that! See section 4.5.3.3 - there is a 1:1 correspondence between contexts and host threads. Perhaps a few words could be added to the manual to emphasise this goes both ways.
Cheers, Eric

I see. thank you very much. In fact I have not read that part since I only used runtime API…