Bug on NVIDIA compiler?

Hello,

I was having some performance problems in my 2D implementations.

I dont know why but I made something stupid for calculating the position in the kernel, instead of using the pitch like in the examples of the programming guide, with calculating the addessing in bytes, I simply put a pitch/sizeof(float) to calculate the index. Then I was trying to fix the problem and I found something very weird.

I suspect some bug in the compiler.

If I use the sizeof in my kernel like this: (I know that it is a stupid code, but is just to show the weird behavior)

px is an int.

int size1 = sizeof(float);

int size2 = sizeof(float);

px = px*size1/size2;

I have for my program a timing of 300ms in a GTX280.

But, if I change the program to this:

px = px* sizeof(float)/ sizeof(float);

I have 630ms for the same graphics card.

Do you have any idea of what is happening?

By the way, I am using linux.

It’s not a bug if it’s not broken.

Use decuda to see what the real machine code is.

Also, check how much local memory you’re using. A compiler may spill a register to lmem randomly and decimate performance.