Strange performance issue performance

zlf · November 23, 2011, 6:51am

I am wondering why kernel pad_kernel_fast runs 16x fast than pad_kernel_slow? The only difference between them is dst[idx] = 3.23424324 and dst[idx] = srcvalue.

global void pad_kernel_fast(float *dst,float *src,int width,int height)
{
const int ix = blockDim.x * blockIdx.x + threadIdx.x;
const int iy = blockDim.y * blockIdx.y + threadIdx.y;

    float srcvalue = src[iy * width + ix];
dst[(iy + 2) * (width + 5) + ix + 2] = [b]3.23424324[/b];

}

global void pad_kernel_slow(float *dst,float *src,int width,int height)
{
const int ix = blockDim.x * blockIdx.x + threadIdx.x;
const int iy = blockDim.y * blockIdx.y + threadIdx.y;

    float srcvalue = src[iy * width + ix];
dst[(iy + 2) * (width + 5) + ix + 2] = [b]srcvalue[/b];

}

Regards,

zlf

Lev · November 23, 2011, 10:19am

Compiler optimizes and throw out unused calculations.

spwanasin · November 23, 2011, 3:43pm

Trying changing 3.23424324 to 3.234243f and test. 3.23424324 is a double which is being converted to a float.

–edit–
My bad, never mind, I miss read your problem…

zlf · November 24, 2011, 2:47am

Dear Lev

I turned “Optimization” to “Disabled (/Od)” in “CUDA Build Rule/General”. The result is the same as before. Anyway to turn “Compiler optimizes” off?

Regards

zlf

Lev · November 24, 2011, 9:41am

In first case compiler eleminates loading of scrvalue, maybe code generator too.

cricri1 · November 28, 2011, 12:04pm

float srcvalue = src[iy * width + ix]; never use so i think no computing
so no read src[iy * width + ix]

Topic		Replies	Views
Too big delay in code, problem CUDA Programming and Performance	3	916	October 22, 2009
Strange behaviour of a kernel function CUDA Programming and Performance	2	2425	March 21, 2008
Luxury problem: kernel is too fast :( Please help me beat the compiler CUDA Programming and Performance	17	9658	November 25, 2009
Performance opposite of expected CUDA Programming and Performance	8	589	March 10, 2022
Examining the generated .ptx file CUDA Programming and Performance	13	2458	October 24, 2014
Slow performance with nested for loops ? Very slow compilation and execution with nested for loops CUDA Programming and Performance	19	3867	July 11, 2010
Seemingly insignificant changes result in a 100x kernel slowdown CUDA Programming and Performance	2	570	February 14, 2020
Performance with memory assigment CUDA Programming and Performance	4	1905	March 7, 2009
CUDA kernel slow/ times out when applying values to results array CUDA Programming and Performance	1	509	April 17, 2015
Uint64_t result evaluation & storage eats up 25% of kernel performance CUDA Programming and Performance cuda , kernel	28	1108	October 3, 2023

Strange performance issue performance

Related topics