How to optimize CUDA program general method

shong · January 21, 2009, 3:25am

Hi everybody,

I have written a CUDA program, and can’t get a ideal speedup. now only about 10 times on GTX280. so could you please help me to optimize the program.

coalescing. how to make the 1D array always coalesced. use the cudaMallocPitch()? i only know the benifit of coalescing, but i have no idea how to do it? could you please give me some ideas, thanks.
texture. to what extend the texture is better than the coalescing global memory?
how to setting the grid size. some times, i run the program, it will exit with “Unspecified launch failure” error message. for this kind of situation, i divide the kernel to run sevral times. it works well. but i want to know why it works? some said that the reason is the watch dog of windows. but in the program, there are some kernels runs more time than these problem kernels, and it runs well. i am confused.
how to deal large array which is can’t load to shared memory. is it fater to read 64 float4 than 256 float?
do you know other optimization methods to speed up the program.

thanks.

cheers,

peter

Topic		Replies	Views
Help me about coalescing my program run too slow CUDA Programming and Performance	5	3000	May 14, 2008
Memory coalescing and multiple arrays CUDA Programming and Performance	23	12141	March 20, 2009
Coalesced? CUDA Programming and Performance	6	2944	February 7, 2009
Help Avoiding Un-Coalesced Memory Access CUDA Programming and Performance	9	9382	October 4, 2010
How to resolve this Coalescing problem? CUDA Programming and Performance	11	2340	May 28, 2009
Checking Performance learning how to optimize CUDA codes CUDA Programming and Performance	4	2186	October 7, 2008
Isn't that Coalesced?! writing to global memory in a coalesced way CUDA Programming and Performance	9	10319	June 28, 2009
Suboptimal performance of CUDA port CUDA Programming and Performance	3	1798	April 7, 2012
Coalescing - beginner question CUDA Programming and Performance	10	1900	June 23, 2010
Help me with very, very poor performance CUDA Programming and Performance	6	3997	May 8, 2008