could somebody help me anlysis this result

chickennight · August 20, 2017, 1:14pm

I do a experinment about the SM execute 2 instreuctions. Iuse a vector add to test, the total threads is 512. For one instruction case, I make all 512 threads to do one work and for 2 instruction case, I make first 256 threads to do first work and another 256 threads to do second work
the code for one instruction:
while (id < n) {
for (k = 0; k < 18000; k++)
c[id] = a[id] + b[id];
id += gridDim.x*blockDim.x;

the coda for 2 instructions likes following

while (id < n) {
		for (k = 0; k < 18000; k++)
		{
			if (id < 256){
				c[id] = a[id] + b[id];
			}
			else {

				u[id] = a[id] + b[id];
			}
			
		}

we can see in two instructions case, the 2 instruction is same.
Then I use the cuda profiler to test the execution time of kernel
the 2 instruction: 13ms and 1 instruction is 11.8ms
and I want to know why the one instruction case has less execution time? I assume this 2 cases should have same execution tiem because in 2 in structions cae the 2 instructions are same
I am so confused , could anybody give me some help

Topic		Replies	Views
could somebody help me anlysis this result CUDA Programming and Performance	1	460	August 20, 2017
the computing time with more instructions in one thread CUDA Programming and Performance	3	503	July 27, 2017
cuda profiler half time 0 instructions. CUDA Programming and Performance	0	1871	July 23, 2008
Very weird behaviour CUDA Programming and Performance	20	1702	March 25, 2011
G80 - 14 clocks per Instruction ? CUDA Programming and Performance	4	3282	March 4, 2008
Measuring speed of a calculation in a single thread CUDA Programming and Performance	6	1214	March 2, 2011
Time differences CUDA Programming and Performance	1	728	January 11, 2012
Question of NVIDIA CUDA Visual Profiler Version 2.2 CUDA Programming and Performance	1	1034	November 13, 2009
time spent for operations in cuda CUDA Programming and Performance	2	1655	August 11, 2009
Profiling the matrixMul exmaple. Why does the number of instructions vary on different hardware? CUDA Programming and Performance	18	1881	August 12, 2010

could somebody help me anlysis this result

Related topics