Hi All,
I am executing 2 instances of same program simultaneously on Tesla K40. The program is launching one kernel with one block having one thread. From device property, I found that there are 15 SMX on Tesla K40. So my understanding was that time for execution of both programs would be same. But, outout was not as per my understanding as when second instance starts execution, time for first instance is almost doubled. Please find below program code and output.
Can someone please help me to understand this behaviour?
#include <stdio.h>
#define BLOCK_CNT 1
#define THREAD_CNT 1
#define HEIGHT 640
#define WIDTH 480
//Kernel code for some calculation
__global__ void kernel (char *frameBuf, int height, int width)
{
for (int ind = 0; ind < height; ind++)
{
for (int ind1 = 0; ind1 < width; ind1++)
{
frameBuf[ind1 + (ind * width)] = frameBuf[ind1 + (ind * width)] * 5;
}
}
}
//Below program is launched twice one after another
int main ()
{
char *frameBufDev = NULL;
cudaError_t retVal = cudaSuccess;
cudaEvent_t start, stop;
float elapsedTime;
int counter = 0;
while (counter++ < 100)
{
retVal = cudaMalloc (&frameBufDev, HEIGHT * WIDTH);
if (retVal != cudaSuccess)
{
printf ("ERROR : frameBufDev mem allocation failed : %s\n",cudaGetErrorString(retVal));
}
cudaMemset (frameBufDev, 1, HEIGHT * WIDTH);
cudaEventCreate (&start);
cudaEventCreate (&stop);
cudaEventRecord (start, 0);
//Launch the kernel with 1 block and 1 thread to do some processing
kernel<<<BLOCK_CNT, THREAD_CNT>>> (frameBufDev, HEIGHT, WIDTH);
retVal = cudaGetLastError();
if (retVal != cudaSuccess)
{
printf ("ERROR : kernel launch fail : %s\n",cudaGetErrorString(retVal));
}
cudaDeviceSynchronize();
cudaEventRecord (stop, 0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&elapsedTime,start, stop);
printf ("Time for execution : %f\n", elapsedTime);
cudaEventDestroy (start);
cudaEventDestroy (stop);
cudaFree (frameBufDev);
}
return 0;
}
And following is the output with two instances,
Instance one:
Time for execution : 56.887390 //running alone
Time for execution : 56.567455
Time for execution : 56.790207
Time for execution : 113.556671 //second instance started
Time for execution : 113.160355
Time for execution : 113.558113
Time for execution : 113.547363
Time for execution : 113.130402
Time for execution : 56.485153 //second instance stopped
Time for execution : 56.722816
Instance Two:
Time for execution : 56.642815
Time for execution : 56.447166
Time for execution : 56.636543
Time for execution : 56.874016
Time for execution : 56.644161
Thanks in advance.