Multiple GPUs host thread safety?

Shy_Shalom · July 15, 2010, 9:48am

I’m running a simple test on a machine with two GTX 275 cards. I’m doing a simple test to see that I can work with the two gpus. here’s the code:

#include <cuda_runtime.h>

#include <QThread>

#include <windows.h>

void testAlloc(int dev)

{

	cudaError ret;

	int N = 256;

	cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();

	cudaExtent extent = make_cudaExtent(N, N, N);

	size_t avail = 0, total = 0;

	ret = cudaMemGetInfo(&avail, &total);

	printf("%d: avail = %d, total = %d\n", dev, avail, total);

	cudaArray *arr;

	ret = cudaMalloc3DArray(&arr, &channelDesc, extent);

	size_t avail2 = 0, total2 = 0;

	ret = cudaMemGetInfo(&avail2, &total2);

	printf("%d: avail = %d, total = %d\n", dev, avail2, total2);

	printf("%d: alloc = %d, asked=%d\n", dev, avail-avail2, N*N*N*sizeof(float));

}

class TestMalloc : public QThread

{

public:

	TestMalloc(int device) : m_device(device) {}

	virtual void run()

	{

		cudaError ret;

		ret = cudaSetDevice(m_device);

		int dev = -1;

		cudaGetDevice(&dev);

		printf("dev=%d\n", dev);

		testAlloc(dev);

	}

	int m_device;

};

int main(int argc, char *argv[])

{

	cudaError ret;

	TestMalloc tm0(0);

	TestMalloc tm1(1);

	tm0.start();

	tm1.start();

	tm0.wait();

	tm1.wait();

	return 0;

for threading I use the Qt framework (QThread).

When I’m running this the thread of device 0 runs ok but the thread of device 1 is deadlocked in the first call to cudaMemGetInfo()

If I add a Sleep(m_device*100) at the beginning of the thread, allowing the first to finish before the second starts, everything works ok.

So what’s going on here? Is the CUDA API not thread safe? do I need to start guarding every cuda usage with mutexes?

avidday · July 15, 2010, 10:08am

The runtime API is thread safe - there are plenty of examples floating around using either pthreads or Boost threads which work perfectly. The only thing you have to be careful about is that contexts are bound to both devices and threads and have the lifetime of the thread that established them. So you need to make sure each thread only ever interacts with its own GPU, and that you do something to keep the threads alive for as long as you need the context the thread holds.

I am not familiar with the Qt threading API so it is pretty hard to parse what the start() and wait() methods your code calls do (or where the run() method which interacts with the GPU is actually called from).

eyalhir74 · July 15, 2010, 10:15am

What happens if you remove the cudaMemGetInfo calls from the testAlloc function? Does it remove the deadlock?

eyal

Shy_Shalom · July 15, 2010, 11:15am

what actually removes the deadlock is if I add:

int* dummy = NULL;

	cudaMalloc(&dummy, 10);

	cudaFree(dummy);

at the very beginning of testAlloc()

I guess this establishes the thread context appropriately and then cudaMemGetInfo() can do its job.

The only question that remains is why cudaMemGetInfo() doesn’t establish the context itself.

For those struggling with the threading API for some reason, run() is the actual code of the that the thread runs, start(), surprisingly enough starts the thread, and wait(), waits for it to finish().

eyalhir74 · July 15, 2010, 11:33am

what actually removes the deadlock is if I add:
int* dummy = NULL;

	cudaMalloc(&dummy, 10);

	cudaFree(dummy);
at the very beginning of testAlloc()

I guess this establishes the thread context appropriately and then cudaMemGetInfo() can do its job.

The only question that remains is why cudaMemGetInfo() doesn’t establish the context itself.

For those struggling with the threading API for some reason, run() is the actual code of the that the thread runs, start(), surprisingly enough starts the thread, and wait(), waits for it to finish().

Thats obviously not a good solution… :)

Try to put the cudaSetDevice instead of the dummy code.

Anyway the cudaMemGetInfo is a CUTIL function (which is not supported) and requires a context before called.

If i remember correctly in my code, if I dont initialize CUDA before calling cudaMemGetInfo I get zero as the card’s memory.

eyal

Shy_Shalom · July 15, 2010, 11:46am

cudaMemGetInfo() is not infact a function of cutil. It is in the cudart dll and is documented in the official cuda runtime documentation.
That documentation says nothing about a valid context and infact, for a user of the cudart api, the concept of contextes should be transparent.
This is clearly a bug in cudart.

eyalhir74 · July 15, 2010, 12:38pm

I stand corrected. You are right.

Topic		Replies	Views
Support for multi-threaded apps on cuda and multiple applications on cuda CUDA Programming and Performance	13	12730	January 24, 2011
How to check work is done by different GPU in multi GPU environment CUDA Programming and Performance	8	2998	June 18, 2009
unable to get the cpu and gpu to run in parallel CUDA Programming and Performance	34	23195	October 7, 2010
CPU threads and CUDA CUDA Programming and Performance	8	7165	January 15, 2018
multi GPU programming issues time taken to create context? CUDA Programming and Performance	1	745	March 28, 2011
cudaGLSetGLDevice bug? CUDA 4.0 CUDA Programming and Performance	4	2816	December 5, 2011
Overhead of using more than one streams? CUDA Programming and Performance	5	6173	April 14, 2009
Contexts and cudaMallocHost Same rules? CUDA Programming and Performance	17	11202	November 15, 2008
MultiGPU start help CUDA Programming and Performance	8	10522	August 10, 2010
cudaHostRegister on multiple threads CUDA Programming and Performance	15	253	June 18, 2024

Multiple GPUs host thread safety?

Related topics