Abort kernel

santyhyammer · February 20, 2007, 2:12pm

Any idea how can I abort the current kernel to force to read the results into the CPU?

Let me explain a bit more… I’m making a raycaster which has the following function:

__device__ bool HitTest ( const sTRIANGLE* tris, float3 rayOrig, float3 rayDir );

it just test if a ray hits a triangle set loaded using cuMalloc() in 1D linear buffer.

The .cu kernel just iterates over the triangle set until a hit is found.

If any triangle is hit by the ray I must abort all the other tests and stop the kernel ( to optimize ).

I’m using 128 threads per block, blocks of 1024 triangles each. The triangle set contains a 6M polygon mesh ( 72Mb )

I could do:

__device__ bool hitFound = false;

__device__ bool HitTest ( const sTRIANGLE* tris, float3 rayOrig, float3 rayDir )

{

    if ( hitFound )

    {

        return true;

    }

    else

    {

        //perform the ray-triangle test

        //blah blah.. some boring maths...

        if ( testGaveHit )

        { 

            hitFound = true;

            return true;

        }

    }

   return false;

}

but I bet that won’t be very efficient… Is there any built-in function to force to stop the kernel execution?

Any idea how to implement this better pls? Do I need just to divide into smaller blocks and test result block by block instead ( like is done in the 1dhaar example? )

Could you include a basic parallelized raycaster in the SDK examples?

Mark_Harris · February 20, 2007, 2:50pm

Use a shared memory variable for “hitFound”. Use thread 0 to initialize it to false.

Threads should ONLY write to it if they find a hit; if they do, they write true. After the write to it, __syncthreads(). Then have all threads check it and return if it is true, or continue if it is not.

Even though multiple threads will be writing the variable at once, the only possible value that they are writing is “true”, so if it is ever true, you know all threads should quit.

Let me know if this isn’t clear and I will write some pseudocode.

Mark

santyhyammer · February 20, 2007, 3:16pm

Something like this???

#define BLOCK_SIZE 1024

#define THREADS_PER_BLOCK 128

struct sRAY

{

    float3 orig, dir;

};

struct sTRIANGLE_VERTEX

{

    float x, y, z;

};

struct sTRIANGLE

{

    sTRIANGLE_VERTEX v[3];

};

__shared__ bool hitFound;

__global__ void InitHitFound () 

{

    hitFound = false;

}

__global__ void HitTestGPU ( const sTRIANGLE* tris, const sRAY* r, bool *res )

{

    if ( !hitFound )

    {

        const sTRIANGLE *tri = &tris[(blockIdx.x*BLOCK_SIZE) + threadIdx.x];

       //blah blah maths....

        if ( rayHitsTri )

        {

            hitFound = true;

            __syncthreads();

            *res = true;

        }

    }

}

void HitTest ( const sTRIANGLE* tris, const unsigned int nTris, const sRAY* r, bool* res )

{

    InitHitFound<<<1,1>>>(); //This will init the shared hitFound to false

    

    const unsigned int nBlocks = (unsigned int)(ceilf((float)nTris/(float)BLOCK_SIZE));

    HitTestGPU<<<nBlocks,THREADS_PER_BLOCK>>> (tris,r,res);

}

???

baarts · February 20, 2007, 5:05pm

shared data has a lifetime of a launch, hence you cannot use a launch to initializa data used in a subsequent launch. I think what Mark means is something like:

__shared__ int x;

__global__ void f(void)

{

  if (threadIdx.x == 0) {

    x = 0;

  }

  __syncthreads();

 // overhere, every thread can access x

}

santyhyammer · February 20, 2007, 5:14pm

But will be efficient? Threads in the block will continue executing the “if ( 0==threadIdx.x )” and the “if ( !hitFound )” for the 6M triangles… that just will skip some maths in the interior ray-triangle test branch… With a built-in intruction to abort the current executing kernel(abort all the threads in block, abort all the blocks) could be more efficient?

santyhyammer · February 21, 2007, 1:14pm

Other option could be to do some consecutive block tests in the host part, like:

void HitTest ( const sTRIANGLE* tris, const unsigned int nTris, const sRAY* r, bool* result )

{

   const unsigned int nBlocks = (unsigned int)(ceilf((float)nTris/(float)BLOCK_SIZE));

   unsigned int i;

  for ( i=0; i<nBlocks; ++i )

   {

      HitTestGPU<<<1,THREADS_PER_BLOCK>>> (tris,r,result);

      if ( result )

      {

         break;

      }

   }

}

but I bet this will kill performance and loose the benefits of CUDA?

Topic		Replies	Views
How to abort the kernel from inside ? I'd kindly ask Mark Harris to check it CUDA Programming and Performance	5	6277	June 9, 2008
How can one thread to stop immediately the kernels work CUDA Programming and Performance	13	4342	August 17, 2013
How can I halt an entire kernel? CUDA Programming and Performance	17	5167	July 3, 2009
Aborting kernel execution from kernel code CUDA Programming and Performance	6	3853	April 27, 2009
kernel abort/return value CUDA Programming and Performance	3	4968	January 24, 2010
Kernel CUDA Programming and Performance	8	2729	October 14, 2009
Interrupt How to interrupt kernel execution CUDA Programming and Performance	3	2543	February 6, 2010
Aborting kernel CUDA Programming and Performance	4	3795	June 7, 2010
How to cancel a running cuda kernel? CUDA Programming and Performance	4	10416	February 1, 2012
How to stop all Threads CUDA Programming and Performance	4	7170	July 10, 2008

Abort kernel

Related topics