MultiGPU example in the CUDA SDK some stack problems

youquanliu · April 23, 2007, 4:18pm

Hi,

I think I have to ask here to get the problem solved.

In MultiGPU example of CUDA SDK, seemly the device memory can only be allocated inside the thread, which means I can only allocate the device memory inside the function of gpuThread. If I allocate the device memory for each GPU before I create the threads, the whole program will hang there.

Are there any solutions for such kind of probem? Because I prefer to allocate all the needed memory before I use CUDA to compute something.

Is it a stack problem?

Great thanks,
YQ External Image

prkipfer · April 23, 2007, 4:27pm

You can do the allocation whenever you like - just make sure to talk to the correct device! The demo ensures this by doing the allocs in the thread that has the correct context info. You can pull that out of the thread code and do the init beforehand switching to the right context.

I personally would dislike your “global” approach. It is much cleaner the way it is done in the demo because the resources are local to each GPU after all.

Peter

youquanliu · April 23, 2007, 6:41pm

I tried. but failed.

When I print out the memory address, I think that the problem.

If I allocate the memory in the gpuThread, the same variable will get the same address for GPU0 and GPU1. However if I allocate the memory before starting the threads, the same variable will get different memory address. I promise when I allocate the memory I always use cudaSetDevice to make sure they are on the same device.

If you are interested, I can send you the code.

Thanks,

YQ

prkipfer · April 24, 2007, 12:43pm

Are you sure that you are not accidentally overwriting a thread global variable with the addresses returned from the cudaMalloc ? If you get the device addresses before the thread fork, you’ll need individual variables for them.

Peter

youquanliu · April 24, 2007, 2:17pm

Thanks. I post some code here, could you help to have a look?

static void allocateDataToCUDA(CUDAThreadDataType* threadData)

{

    CUDA_SAFE_CALL(cudaSetDevice(threadData->Id));

   CUDA_SAFE_CALL(cudaMalloc( (void**) &threadData->dA, matrix_size));

    CUDA_SAFE_CALL(cudaMemcpy(threadData->dA, threadData->hA, matrix_size,    cudaMemcpyHostToDevice) );	

}

static CUT_THREADPROC gpuThread(CUDAThreadDataType * data)

{

   CUDA_SAFE_CALL( cudaSetDevice(data->Id) );

    // Invoke kernel on this device.

    Kernel<<<XXX,  XXX>>>();

    CUT_THREADEND;

}

in the gpuThread, I use this way,

 for(i = 0; i < s_gpuCount; i++)

 {

threadData[i].Id = i;

threadData[i].hA  = A;

allocateDataToCUDA(&threadData[i]);

 }

then start the threads,

for(i = 0; i < s_gpuCount; i++)

{

threads[i]   = cutStartThread((CUT_THREADROUTINE)gpuThread, (void *)&threadData[i]);

}

1304498454 · March 11, 2018, 4:27am

Where can i find the load of the SDK to download?

Topic		Replies	Views
CPU-GPU question CUDA Programming and Performance	6	814	June 2, 2011
Multithreading and CUDA CUDA Programming and Performance	6	9082	April 14, 2010
memory allocating in __device__ function How? CUDA Programming and Performance	5	2736	October 28, 2008
Memory allocation from a device function? CUDA Programming and Performance	1	1779	April 11, 2007
On which device are __device__ variables allocated? CUDA Programming and Performance	21	6441	March 13, 2009
multigpu portable memory problem CUDA Programming and Performance	1	1408	August 30, 2009
Multiple GPU memory address problem help CUDA Programming and Performance	6	7775	November 17, 2009
MultiGPUs newbie question Data transformation problem CUDA Programming and Performance	12	5152	March 18, 2008
MultiGPU start help CUDA Programming and Performance	8	10522	August 10, 2010
questions memory allocation and CUDA contexts CUDA Programming and Performance	7	11262	February 4, 2008

MultiGPU example in the CUDA SDK some stack problems

Related topics