Hi !
I’d like to develop a fast block amtching algorithm with CUDA. I already designed a program but it wasn’t efficient enough (it takes 35 ms to process the MV for 4*4 blocks of 720 * 522 frames).
So I foud and read articles on the topic and they were all speaking about a same program computing SAD that I finally foud at this address : http://code.google.com/p/gpuocelot/source/…vn324&r=223.
But there are some portions of code thaht I don’t understand.
For example :
[codebox]/* Allocate SAD data on the device */
cudaMalloc((void **)&d_sads, 41 * MAX_POS_PADDED * image_size_macroblocks *
sizeof(unsigned short));[/codebox]
Why do they use the number 41 ? And :
[codebox]mb_sad_calc<<<dim3(CEIL(ref_image->width / 4, THREADS_W),
CEIL(ref_image->height / 4, THREADS_H)),
dim3(CEIL(MAX_POS, POS_PER_THREAD) * THREADS_W * THREADS_H),
SAD_LOC_SIZE_BYTES>>>
(d_sads,
(unsigned short *)d_cur_image,
image_width_macroblocks,
image_height_macroblocks);
CUDA_ERRCK[/codebox]
How can they compute all tje SADs for the 4*4 nlocks if the array d_sads has been allocated with MAX_POS_PADDED * image_size_macroblocks ?
I think understanding this code will be very helpful for me.
Somebody could explain it to me ? Thks in advance.
P.S : sorry for my bad english, I’m french ;)