nvcc keeps dying on me

nvcc -O3 -lstdc++ -rdc=true -gencode arch=compute_50,code=sm_50 -maxrregcount 32 -c tetra.cu

nvcc error   : 'ptxas' died due to signal 11 (Invalid memory reference)
nvcc error   : 'ptxas' core dumped
Makefile:18: recipe for target 'tetra.o' failed

This is the error I keep getting and I only get it when I try to do one simple calculation.

const int off = (end - begin) * num_points;

Last time I posted this, my entire source code was requested. Unfortunately, my source code is kind of a beast so let’s start off with just the kernel. I’m trying to use dynamic parallelism too so I’m not sure if that has anything to do with it.

I hate updating github so before I do that, here’s my parent kernel :

__global__
void redistribute_points(const int          fract_num_buckets,
                         const int         *tetra_bucket_starts,
                         const int         *fract_bucket_starts, 
                         const int         *fl,
                         const tetrahedron *mesh,
                         const point       *points,
                         const float       *predConsts,
                               int         *pa,
                               int         *ta,
                               int         *fs,
                               int         *la)
{
    const int thread_id = threadIdx.x + blockIdx.x * blockDim.x;

    // for every fracture bucket...
    for (int tid = thread_id; tid < fract_num_buckets; tid += bpg * tpb)
    {
        // bucket id = tid

        // number of points 
        const int num_points = tetra_bucket_starts[tid] -
                               (tid > 0 ? tetra_bucket_starts[tid - 1] : 0);

        // start of point indices...
        const int *point_starts = pa + 
                                  (tid > 0 ? tetra_bucket_starts[tid -1] : 0);

        // begin/end of bucket
        const int begin = (tid > 0 ? fract_bucket_starts[tid - 1] : 0);
        const int end   = fract_bucket_starts[tid];

        const int off = (end - begin) * num_points;

        //printf("Proposed offsets in association arrays : %d\n", off);

        // iterate fracture buckets...
        for (int i = begin; i < end; ++i)
        {
            // get the tetrahedron
            const tetrahedron *t = mesh+ fl[i];

            const int blc = (num_points / tpb) + num_points % tpb;

            calculate_point_info<<<blc, tpb>>>
                                (num_points,
                                 t,
                                 points,
                                 point_starts,
                                 predConsts,
                                 pa,
                                 ta,
                                 fs,
                                 la);
        }
    }
}

And here’s the child kernel :

__global__
void calculate_point_info(const int          num_points,
                          const tetrahedron *tet,
                          const point       *points,
                          const int         *pa_starts,
                          const float       *predConsts,
                                int         *pa,
                                int         *ta,
                                int         *fs,
                                int         *la)
{ 
    const int thread_id = threadIdx.x + blockIdx.x * blockDim.x;

    // for each point in the buckets...
    for (int tid = thread_id; tid < num_points; tid += bpg * tpb)
    {
        // read in tetrahedron
        const tetrahedron t = *tet;

        // read in points
        const point a = points[t.v[0]];
        const point b = points[t.v[1]];
        const point c = points[t.v[2]];
        const point d = points[t.v[3]];

        // read point
        const point p = points[pa_starts[tid]];

        // orienation of p vs every face
        const int ort0 = orientation(predConsts, d.p, c.p, b.p, p.p); // 321
        const int ort1 = orientation(predConsts, a.p, c.p, d.p, p.p); // 023
        const int ort2 = orientation(predConsts, a.p, d.p, b.p, p.p); // 031
        const int ort3 = orientation(predConsts, a.p, b.p, c.p, p.p); // 012

        // if point is outside tetrahedron
        if (ort0 < 0 || ort1 < 0 || ort2 < 0 || ort3 < 0)
            return;
    
        // write location association
        int x = 0;

        x |= (ort0 << 0);
        x |= (ort1 << 1);
        x |= (ort2 << 2);
        x |= (ort3 << 3);

        printf("%d\n", x);
    }
}

I have no idea why this makes the compiler fail… T_T

Sanity check: Make sure all your CUDA header files belong to the same CUDA version the tool chain belongs to. I am not sure what the exact dependency is, but I believe it has to do with the declaration of vector types and the like.

If the sanity check comes up clean, a seg fault (signal 11, GPF) indicates an internal compiler bug, and you would want to file a bug report via the reporting form linked from the registered CUDA developer website. Most of the segfaults I encountered in PTXAS over the years were related to advanced optimizations, so as a quick workaround, you could try lowering the PTXAS optimization level. The default is -Xptxas -O3, so try -O2 then -O1.

Once you have filed a bug report, the compiler team may be able to provide a more targeted work around.