nvcc -O3 -lstdc++ -rdc=true -gencode arch=compute_50,code=sm_50 -maxrregcount 32 -c tetra.cu
nvcc error : 'ptxas' died due to signal 11 (Invalid memory reference)
nvcc error : 'ptxas' core dumped
Makefile:18: recipe for target 'tetra.o' failed
This is the error I keep getting and I only get it when I try to do one simple calculation.
const int off = (end - begin) * num_points;
Last time I posted this, my entire source code was requested. Unfortunately, my source code is kind of a beast so let’s start off with just the kernel. I’m trying to use dynamic parallelism too so I’m not sure if that has anything to do with it.
I hate updating github so before I do that, here’s my parent kernel :
__global__
void redistribute_points(const int fract_num_buckets,
const int *tetra_bucket_starts,
const int *fract_bucket_starts,
const int *fl,
const tetrahedron *mesh,
const point *points,
const float *predConsts,
int *pa,
int *ta,
int *fs,
int *la)
{
const int thread_id = threadIdx.x + blockIdx.x * blockDim.x;
// for every fracture bucket...
for (int tid = thread_id; tid < fract_num_buckets; tid += bpg * tpb)
{
// bucket id = tid
// number of points
const int num_points = tetra_bucket_starts[tid] -
(tid > 0 ? tetra_bucket_starts[tid - 1] : 0);
// start of point indices...
const int *point_starts = pa +
(tid > 0 ? tetra_bucket_starts[tid -1] : 0);
// begin/end of bucket
const int begin = (tid > 0 ? fract_bucket_starts[tid - 1] : 0);
const int end = fract_bucket_starts[tid];
const int off = (end - begin) * num_points;
//printf("Proposed offsets in association arrays : %d\n", off);
// iterate fracture buckets...
for (int i = begin; i < end; ++i)
{
// get the tetrahedron
const tetrahedron *t = mesh+ fl[i];
const int blc = (num_points / tpb) + num_points % tpb;
calculate_point_info<<<blc, tpb>>>
(num_points,
t,
points,
point_starts,
predConsts,
pa,
ta,
fs,
la);
}
}
}
And here’s the child kernel :
__global__
void calculate_point_info(const int num_points,
const tetrahedron *tet,
const point *points,
const int *pa_starts,
const float *predConsts,
int *pa,
int *ta,
int *fs,
int *la)
{
const int thread_id = threadIdx.x + blockIdx.x * blockDim.x;
// for each point in the buckets...
for (int tid = thread_id; tid < num_points; tid += bpg * tpb)
{
// read in tetrahedron
const tetrahedron t = *tet;
// read in points
const point a = points[t.v[0]];
const point b = points[t.v[1]];
const point c = points[t.v[2]];
const point d = points[t.v[3]];
// read point
const point p = points[pa_starts[tid]];
// orienation of p vs every face
const int ort0 = orientation(predConsts, d.p, c.p, b.p, p.p); // 321
const int ort1 = orientation(predConsts, a.p, c.p, d.p, p.p); // 023
const int ort2 = orientation(predConsts, a.p, d.p, b.p, p.p); // 031
const int ort3 = orientation(predConsts, a.p, b.p, c.p, p.p); // 012
// if point is outside tetrahedron
if (ort0 < 0 || ort1 < 0 || ort2 < 0 || ort3 < 0)
return;
// write location association
int x = 0;
x |= (ort0 << 0);
x |= (ort1 << 1);
x |= (ort2 << 2);
x |= (ort3 << 3);
printf("%d\n", x);
}
}
I have no idea why this makes the compiler fail… T_T