Thread processing overhead

iam_peter · February 15, 2011, 6:07pm

hello,

i have problem with allocating memory and writing to it through threads.
my situation:

i want to write a CUDA acceleration structure.
so i start with a bunch of triangles, we assume 17 tris.
the task is to compute the bounding boxes of these tris.
i allocate tri_count * 2 * float3 memory in which i want to write with the kernel threads.
now i start 4 blocks with 5 threads, every thread handling one tri (so the configuration is just a simple example).

the problem is, that the allocated memory does not fit to the thread count.
there are more threads than tris, the writing operation of the last 3 threads aren’t valid.
what is the best way to avoid this?

regards,
peter

kbam · February 16, 2011, 4:37am

similar to this (from CUDA C Programming Guide Version 3.2 )

__global__ void VecAdd(float* A, float* B, float* C, int N) 

{ 

  int i = blockDim.x * blockIdx.x + threadIdx.x; 

  if (i < N) 

     C[i] = A[i] + B[i]; 

}

where you pass N (in your example 17 ) to kernel like this

VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);

iam_peter · February 16, 2011, 6:42pm

thanks, did it this way.
i thought there would be another solution (integrated into CUDA), because it is a common problem.
but this one is also good.

regards,
peter

Topic		Replies	Views
Occupancy and memory CUDA Programming and Performance	3	1606	March 25, 2010
kernel memory allocation tenets CUDA Programming and Performance	5	2530	May 12, 2008
Unable to access the entire allocated space CUDA Programming and Performance	3	4029	July 2, 2009
Cuda shared memory within thread blocks problem CUDA Programming and Performance	2	6670	February 24, 2011
Is padding needed for Malloc? CUDA Programming and Performance	2	5041	September 25, 2009
Problem with Memory allocation in GTX 295 multiThread out of memory when allocation of more than 768 CUDA Programming and Performance	1	1495	October 5, 2009
Two questions about too many threads in a block CUDA Programming and Performance	5	2366	October 26, 2011
Limit of memory per thread? can't find a solution in the programmers guide CUDA Programming and Performance	1	1157	June 6, 2010
Number of threads in kernel doesn't work as expected strange behavior CUDA Programming and Performance	1	819	July 2, 2010
OnDevice malloc() , dependence on block_dim CUDA Programming and Performance	5	2145	June 15, 2011

Thread processing overhead

Related topics