Error running code that works in emulation mode

arjun53 · July 11, 2008, 2:45pm

Hello everyone,

I have been struggling with this problem for a while now and am almost at my wits end. Please help!

Basically the code shown below is part of an imaging process, specifically it deals with accelerating the gridding section. It runs as expected in device emulation mode and returns the right answer but seems to stop before completing execution when run on the device. All I’ve done is to expand what was originally a nested for loop to run on nSamples number of blocks each with nChan threads.

I have a GeForce 8600GTS and am running Ubuntu 7.10 with the CUDA 2.0beta2 tooklit and the 177.13 driver.

Any help will be much appreciated!

nSamples = 1000;

nChan = 16;

// Copy data to GPU memory 

.....

  // run the code using 1000 grids of 16 threads each

  dim3 dimGrid(nSamples, 1, 1);

  dim3 dimBlock(nChan, 1, 1);

 start = clock();

  kernel<<< dimGrid, dimBlock >>>((float*)d_u, (float*)d_v, (float*)d_w, (cuComplex*)d_data, (cuComplex*)d_grid, (float*)d_freq, (int*)d_cOffset, (float*)d_C, (int*)d_ints, (float*)d_flts);

  cudaThreadSynchronize();

 // check if kernel execution generated and error

  CUT_CHECK_ERROR("Kernel execution failed");

  CUDA_SAFE_CALL(cudaMemcpy(ints, d_ints, intsMemSize, cudaMemcpyDeviceToHost));

  CUDA_SAFE_CALL(cudaMemcpy(flts, d_flts, fltsMemSize, cudaMemcpyDeviceToHost));

  finish = clock();

 printf("    Count = %d \n\n", ints[4]);

  // Report on timings

  printf("    Total weight = %e \n", flts[1]);

  time = (double(finish)-double(start))/CLOCKS_PER_SEC;

  printf("    Time %f(s) \n", time);

....

__global__ void

kernel(float* u, float* v, float* w, cuComplex* data, cuComplex* grid, float* freq, int* cOffset, float* C, int* ints, float* flts)

{

  float cellSize = flts[0];

  float sumwt = flts[1];

  float sumviswt = flts[2];

 int nChan = ints[0];

  int overSample = ints[1];

  int gSize = ints[2];

  int support = ints[3];

  

  int i = blockIdx.x;

  int chan = threadIdx.x;

  

  int find, coff, iu, fracu, iv, fracv, suppv, suppu, vind, gind;

  float uScaled, vScaled, wt;

  int cSize=2*(support+1)*overSample+1;

  int cCenter=(cSize-1)/2;

 find=i*nChan+chan;

 coff=cOffset[find];

 uScaled=freq[chan]*u[i]/cellSize;

  iu=(int)(uScaled);

  fracu=(int)(overSample*(uScaled-(float)(iu)));

  iu+=gSize/2;

 vScaled=freq[chan]*v[i]/cellSize;

  iv=(int)(vScaled);

  fracv=(int)(overSample*(vScaled-(float)(iv)));

  iv+=gSize/2;

 for (suppv=-support;suppv<+support;suppv++)

  {

     vind=cSize*(fracv+overSample*suppv+cCenter)+fracu+cCenter+coff;

     gind=iu+gSize*(iv+suppv);

     for (suppu=-support;suppu<+support;suppu++)

     {

        wt=C[vind+overSample*suppu];

        grid[gind+suppu][0]+=wt*data[find][0];

        sumwt+=wt;

      }

   }

 flts[0] = cellSize;

  flts[1] = sumwt;

  flts[2] = sumviswt;

  ints[4]++;

  __syncthreads();

}

jordyvaneijk · July 14, 2008, 8:36am

Put this behind the kernel invocation to see if there is something wrong with your kernel

cudaThreadSynchronize();

	cudaError_t error = cudaGetLastError();

	if (error != cudaSuccess)

  printf("error :%s\n",cudaGetErrorString(error));

	// check if kernel execution generated and error

andrew_cooke · July 15, 2008, 7:20pm

i’m a complete newbie to this, so take this with a grain of salt and/or apologies if it’s obvious. i read somewhere that there’s a limit to the time spent in a call (5 seconds?). what you describe could be that.

tmurray · July 15, 2008, 8:16pm

On Windows, if you’re using a display card for CUDA (not just a graphics card necessarily–you can have a GeForce card without the desktop extended onto that card that does not count as a display card), if a kernel execution goes above ~5s, Windows will kill the execution of the kernel. This is to prevent rogue apps or driver bugs from hanging the system, but it’s pretty annoying for CUDA.

I don’t think a similar mechanism exists in Linux, but I’m not 100% sure.

shinkee · July 16, 2008, 12:59am

I’m think this mechanism exists in Ubuntu, last time i tried putting a very long loop in my kernel and the program just give me an execution timeout after a few seconds.

Sometimes it just freezes my computer and I had to do a hard reboot.

andrew_cooke · July 19, 2008, 11:45pm

it’s described in the linux release notes at http://developer.download.nvidia.com/compu…ux_2.0beta2.txt (just reading them now and remembered this thread)

Topic		Replies	Views
Working emulation program but failing gpu program How to do a bug search when the emulation runs fin CUDA Programming and Performance	4	2681	December 8, 2008
Emulation on Linux: basic questions CUDA Programming and Performance	9	12964	June 4, 2009
CUDA hangs on GPU but not in emulation CUDA Programming and Performance	7	5358	August 21, 2008
CUDA 2.1 Beta Problem/Bugs (Linux) CUDA Programming and Performance	5	1649	January 6, 2009
Code works under emulation, but fails on the device CUDA Programming and Performance	3	2176	July 30, 2009
GPU and CPU don't run in (pure) parallel ? CUDA Programming and Performance	24	20164	May 4, 2007
Inexpiable CUDA hang (NOT WDM timeout!) CUDA Programming and Performance	2	1487	June 5, 2014
Deminishing performance? CUDA Programming and Performance	29	13103	March 5, 2009
Bluescreen while running CUDA kernel CUDA Programming and Performance	5	7705	July 8, 2009
Emulation/CPU=correct,Execution/GPU=incorrect emulation CUDA Programming and Performance	26	21482	September 2, 2008

Error running code that works in emulation mode

Related topics