Memory Optimization Coalesce

texwiller204 · May 12, 2010, 2:11pm

Dear,
I am trying to optimze my cuda implementation,after investigation with cuda visual profiler i found that my proble is the memory uncoalesce

Here is my problem
I have let say 2000 block with 128 threads each
Thread of each block share some commun data (data structre (13 interger variable) )
How can I coalesce the reading of my global memory so that I first load the data into share memory.

I read this thread [url=“http://forums.nvidia.com/index.php?showtopic=79620&pid=452234&mode=threaded&show=&st=#entry452234”]The Official NVIDIA Forums | NVIDIA
the problem here is that I am confuse with the way he his reading data (int and struct hello ) and what he meaning by smemAOS

Thanks for your reply.

Willer

Sijo · May 21, 2010, 1:17pm

Coallacing issue can be removed by binding that memory in texture of the GPU. Note that texture memory is read only.