Is padding needed for Malloc?

TimoS · September 10, 2009, 11:38am

When malloc:ing say 10-member vectors a() ,b() and c(), and multiplying c(i)=a(i)*b(i) with 16 threads, what happens with the 10th to 16th thread? Do they read from outside of a() and b() and save that to past of c()? I can’t see any methods to control what happens when there are more threads than members in vectors. Is the only possibility to Malloc extra members for the remaining threads?

LSChien · September 10, 2009, 2:37pm

you must set boundary condition explicitly in your kernel code, for example

[codebox]global void add( float *C, float *A, float *B , unsigned int N )

{

unsigned int idx = blockIdx.x * BLOCK_DIM + threadIdx.x;



if ( idx < N ){ // N is size of A and B

	C[i] = A[i] * B[i] ;

}

}[/codebox]

TimoS · September 25, 2009, 7:39pm

Hi. Thanks for example. It is still a bit unclear to me, what happens to the “overflow” threads between N and blocsize. As far as I understand, all threads in a block are executing exactly the same commands. How is it then prevented that the overflow threads do nothing?

TimoS

Topic		Replies	Views
Thread processing overhead CUDA Programming and Performance	2	507	February 16, 2011
malloc in kernel CUDA Programming and Performance	3	1098	September 20, 2011
CUDA in-kernel malloc CUDA Programming and Performance	4	9986	July 19, 2011
OnDevice malloc() , dependence on block_dim CUDA Programming and Performance	5	2162	June 15, 2011
Number of threads in kernel doesn't work as expected strange behavior CUDA Programming and Performance	1	829	July 2, 2010
Is it thread-safe to malloc in threads of a kernel function? CUDA Programming and Performance	7	2452	December 8, 2017
Weird behaviour with cudaMalloc CUDA Programming and Performance	2	3083	January 15, 2010
memory function does not see the memory, although the device array is copied CUDA Programming and Performance	2	2602	May 29, 2010
Memory management in the device Is there any caching in device's memory? CUDA Programming and Performance	2	3623	September 4, 2008
Multiplying two arrays CUDA Programming and Performance	6	5321	May 7, 2008

Is padding needed for Malloc?

Related topics