I was having some performance problems in my 2D implementations.
I dont know why but I made something stupid for calculating the position in the kernel, instead of using the pitch like in the examples of the programming guide, with calculating the addessing in bytes, I simply put a pitch/sizeof(float) to calculate the index. Then I was trying to fix the problem and I found something very weird.
I suspect some bug in the compiler.
If I use the sizeof in my kernel like this: (I know that it is a stupid code, but is just to show the weird behavior)
px is an int.
int size1 = sizeof(float); int size2 = sizeof(float); px = px*size1/size2;
I have for my program a timing of 300ms in a GTX280.
But, if I change the program to this:
px = px* sizeof(float)/ sizeof(float);
I have 630ms for the same graphics card.
Do you have any idea of what is happening?
By the way, I am using linux.