cudaMallocHost and pthreads issues with accessing memory from different threads

worc1154 · November 14, 2008, 11:19am

Hi,

I am trying to take advantage of multiple graphics cards, thought my tests are currently running with just the one. The outline of the program goes:

[codebox]

Allocate memory on the host, and load in raw data.

spawn a thread for each card, each thread will

      Initialise the card, and allocate memory on it.

forall tiles{

         copy a tile across onto the card from the hoast using cudaMemcpy3dAsync

run analysis on this tile

copy results back

end threads

compare results with those generated from the gold function.

[/codebox]

As I am using streams, it is necessary to use [font=“Courier New”]cudaMallocHost[/font] to allocate memory for the raw data and the results on the host, however it appears that the memory allocated this way is only visible to the thread is was allocated in, even though the pointer to it is a global variable. The code all runs fine in the emulator mode. I have experimented with moving the allocation into different parts of the code, and any function using the memory from a different thread fails with either an argument error in the case of [font=“Courier New”]cudaMemcpy3DAsync[/font], or a segmentation error if it is a direct access from the code (the test against the gold results if I allocate memory inside the spawned thread).

Is this the expected behaviour? Does [font=“Courier New”]cudaMallocHost[/font] only allow the memory to be used by the thread that called it?

If so, is there a work round? I realise I could allocate a single tile and then copy the data into this from memory allocated with the standard [font=“Courier New”]malloc[/font], but this seams very inefficient.

If it is not the expected behaviour, does anyone have any ideas what I am doing wrong?

Many thanks in advance

Daniel

bdg146psu · November 14, 2008, 2:15pm

In my experience, memory allocated in one thread using cudaMallocHost IS accessible by other threads. However, the memory is only ‘pinned’ in the thread is it allocated in. In the other threads, it is seen as unpinned memory. There is definitely an issue with accessing GPU memory in thread B if it was allocated in thread A.

I suspect the difference between my experiences and what you are encountering may be the fact that you are using streams. I was not. However, glancing at the documentation, I can’t seem to find anything that supports this.

bdg146psu · November 14, 2008, 2:20pm

This thread doesn’t address anything with streams, but it does go over the cudaMallocHost() and pthreads issues. I thought you may find it helpful.

MisterAnderson42 · November 14, 2008, 2:30pm

This is the behavior I get too.

Tim Murray has mentioned that pinned memory for all GPUs is coming in a future CUDA release (that is not that 2.1 release due in beta form any day now).

If you need async transfers in all of your threads in the meantime, you will need allocate separate cudaMallocHost areas in each thread.

Topic		Replies	Views
Contexts and cudaMallocHost Same rules? CUDA Programming and Performance	17	11240	November 15, 2008
cudaHostAlloc and thread safety problems with pinned, portable memory CUDA Programming and Performance	2	1823	April 8, 2011
Questions for multiple GPUs CUDA Programming and Performance	8	7167	April 20, 2009
Mapped memory across multiple GPUs CUDA Programming and Performance	3	8743	October 28, 2010
Interleaving cudaMalloc and kernels on multiple cpu threads - performance? CUDA Programming and Performance	6	1443	March 5, 2018
Is pinned memory possible in mixed cpp and cuda CUDA Programming and Performance	3	2849	January 29, 2009
Reporting a problem with CUDA memory access in multiple OS threads CUDA Programming and Performance	4	4899	April 30, 2007
Simple cudaMallocHost beginner question CUDA Programming and Performance	5	2719	September 29, 2008
cudaMalloc and threads "invalid device pointer" error CUDA Programming and Performance	4	5450	June 26, 2007
Problems with cudaHostAlloc and cudaMemcpyAsync CUDA Programming and Performance	5	4526	February 8, 2010

cudaMallocHost and pthreads issues with accessing memory from different threads

Related topics