I’m a student who’s trying to use CUDA for image processing algorithms. It’s not long since I started learning it. I just read through the Programming Guide and am now trying to understand some sample code, particularly “SobelFilter” for the moment. Everything looks good until it came to the mode “SOBELSHARED” which uses the shared memory.
Codes I have questions(part 1): in SobelFilter.cu
int BlockWidth = 80; // must be divisible by 16 for coalescing
dim3 blocks = dim3(iw/(4BlockWidth)+(0!=iw%(4BlockWidth)),
int SharedPitch = ~0x3f&(4*(BlockWidth+2Radius)+0x3f);
int sharedMem = SharedPitch(threads.y+2*Radius);
// for the shared kernel, width must be divisible by 4
iw &= ~3;
What’s the meaning of “BlockWidth”?
Why is “blocks” calculated like that? I know the meaning of (0!=iw%(4BlockWidth)), just why (iw/(4BlockWidth))? “iw” refers to number of pixels in width of image, right? Why divided by BlockWidth, not threads.x? Why product 4?
I guess SharedPitch is the width in bytes of shared memory a block owns. Again why calculated like that? Why product 4? Is the “0x3f” stuff some alignment?
I guess I’ll understand sharedMem if I do SharedPitch. It looks right because “threads.y+2*Radius” is number of rows of shared memory for one block.
If “iw” is not multiple of 4 in practice, we just don’t care the remaining pixels?
I have more questions regarding to the kernel code, but a little tired now. Hopefully many of them will disappear after the above ones solved. Any help would be appreciated.