The app doesn't call some global functions... why?!?! I'm going crazy!

I wrote a little app which is formed by cpu’s code and cuda’s code.

I don’t understand why some global function start without problems and others not!

This is the code, and the bum part (!!!) is in the last 4 lines…

while (*changed) {


		printf("%d\n", n);


		cudaMemcpy(d_changed, changed, sizeof(bool), cudaMemcpyHostToDevice);

		mediaReset<<< dimGrid2, dimBlock2 >>>(d_media, nsoglie);


		mediaDistance<<< dimGrid1, dimBlock1 >>>(d_media, d_newImg, nsoglie, img->width(), img->height());


		mediaRecalculate<<< dimGrid2, dimBlock2 >>>(d_media, d_changed, nsoglie);


		cudaMemcpy(changed, d_changed, sizeof(bool), cudaMemcpyDeviceToHost);


	votiInMat<<< dimGrid1, dimBlock1 >>>(d_media, d_newImg, d_votiMat, nsoglie, img->width(), img->height());


	votazioneFinestra<<< dimGrid1, dimBlock1 >>>(d_media, d_newImg, d_votiMat, nsoglie, raggio, img->width(), img->height());


I’ve seen the code more and more times, and there’s NOTHING WRONG! These two global functions are similar at all at the others…

The compiler finishes without any error, but at the end I can see that these last two global functions are not called!

I’m using Ubuntu 10.4… can someone help me?


Include some error checking right after each of the last two kernel calls:


cout cudaGetErrorString(cudaGetLastError());

What are width() and height()?

The problem was in the exagerate dimension of the block’s size!

I reduced it to 400 threads per block and now it works without problems also with big images!

ps. Width and Height are refered to the image ;)

Make your blocksize a multiple of 64 (or 32) for best performance.