Mapped Memory on a Multi-Threaded Single GPU application

akavo · August 4, 2011, 3:25pm

I have issues when using mapped memory within a multithreaded application with CUDA 4.0.

A kernel which previously worked fine in a single threaded application is now failing when it writes to mapped memory. The kernel returns cudaErrorInvalidValue and further debugging with Parallel Nsight & cuda-memcheck shows “error = access violation on store” when writing in an array of booleans and an array of pointers.

My mapped memory arrays are allocated in the main host thread but are passed to the kernel by different worker threads. I suspect this may be the source of the problem. I call cudaSetDeviceFlags(cudaDeviceMapHost); on the main thread also.

Can someone explain whether it is ok to allocate mapped memory in one thread and use it on the other? On which thread should cudaSetDeviceFlags be called?

Again I’m 99% sure this isn’t a problem with the kernel as it’s extensively tested to work in a single threaded app.

Thank you.

akavo · August 4, 2011, 5:32pm

Update: I let each thread do its own allocations but the problem persists. It’s behaviour is random.

The arrays I’m trying to write to are allocated as members of a struct which is passed by value.

MTMPL struct MMarker{

public:

	Type *h_values, *d_values;

	bool *h_found, *d_found;

	MMarker()

	{	

		cudaHostAlloc((void**) &(h_values), MAX_KEYS*sizeof(Type), cudaHostAllocMapped);

		cudaHostGetDevicePointer((void**)&(d_values), (void*)(h_values), 0);

		cudaHostAlloc((void**) &(h_found), MAX_KEYS*sizeof(bool), cudaHostAllocMapped);

		cudaHostGetDevicePointer((void**)&(d_found), (void*)(h_found), 0);

	}

	~MMarker()

	{

		cudaFreeHost((void*) h_values);

		cudaFreeHost((void*) h_found);

	}

};

MTMPLR __global__

void multi_contains_kernel(Key* keys, int k, MParams<Key,Type>* prms, MMarker<Key,Type> marker)

{

//unrelated code...

	int key_index=tid/warpSize;	//index belonging to thread

	

	bool lFound;

	if ((tid&(warpSize-1))==0 && key_index < k)

	{

		Key key=keys[key_index];		//key belonging to thread

		int sMemKeyIndex=key_index%maxKeysPerBlock;

		lFound=find<Key,Type,reversed>(key, &preds[sMemKeyIndex*MAX_HEIGHT], &succs[sMemKeyIndex*MAX_HEIGHT], prms);

		marker.d_found[key_index]=lFound; //THIS CAN FAIL

		if (lFound)

		{

			marker.d_values[key_index]=succs[sMemKeyIndex*MAX_HEIGHT]->data; //THIS CAN ALSO FAIL

		}

	}

}

I check after allocations for any errors but there are none.

Topic		Replies	Views
problem with mappe memory CUDA Programming and Performance	3	8918	March 22, 2011
Mapped memory across multiple GPUs CUDA Programming and Performance	3	8796	October 28, 2010
Multiple GPU memory address problem help CUDA Programming and Performance	6	7850	November 17, 2009
How to pass two flags to cudaHostAlloc()? CUDA Programming and Performance	5	9301	June 17, 2009
mapped memory modified by cpu thead different frrom the one that launched the gpu CUDA Programming and Performance	1	524	April 11, 2011
Error when allocating mapped memory after allocating other memory CUDA Programming and Performance	3	6736	July 4, 2010
Multi-thread memset and memcpy CUDA Programming and Performance	2	3116	June 8, 2009
error in using cuda mapped memory a test program for mapped memory CUDA Programming and Performance	0	1013	April 20, 2010
Mapped Pinned Memory Question CUDA Programming and Performance	0	993	February 16, 2010
mapped memory with C1060 CUDA Programming and Performance	3	662	October 25, 2010

Mapped Memory on a Multi-Threaded Single GPU application

Related topics