Access violation

I’m trying to convert a path tracer to use CUDA instead of running on the CPU. On the way, I’ve run into a problem. I keep getting access violations and I’m not sure why.

I have a kernel called TraceRays:

__global__ void TraceRays(const Camera* cam, const Sphere* shape, Color* result, unsigned width) {
	const unsigned x = blockIdx.x * blockDim.x + threadIdx.x;
	const unsigned y = blockIdx.y * blockDim.y + threadIdx.y;
	const unsigned i = y*width + x;
	const Ray ray = cam->GetRay(x, y);

	Point p = ray(0.5f);

	result[i] = Color();
	float t;
	if (shape->Intersect(ray, t))
		result[i] = Color(0.f, 1.f, 0.f);

NSight’s memory checker tells me that I specify an invalid address on load, at the “if (shape->Intersect(ray, t))” line. However, checking the values with the debugger shows that shape has the correct position and radius.

This is how I allocate them:

Camera* cam = new Camera(...);
Sphere* sphere = new Sphere(Point(0.f, 0.f, 0.f), 2.f);
Camera* d_cam; cudaMalloc(&d_cam, sizeof(Camera));
Sphere* d_sphere; cudaMalloc(&d_sphere, sizeof(Sphere));

cudaMemcpy(d_cam, cam, sizeof(Camera), cudaMemcpyHostToDevice);
cudaMemcpy(d_sphere, sphere, sizeof(Sphere), cudaMemcpyHostToDevice);

To see if it might be that, somehow, it will not call methods of objects on the device, I added that “Point p = ray(0.5f);” line. This seems to be working just fine.

This is how my classes are declared:

class Sphere : public Shape {
	Point c;
	float r;

	Sphere(const Point& c, float r);

	__host__ __device__ bool	Intersect(const Ray& ray, float& t) const;
	__host__ __device__ Vector	GetNormal(const Point& p) const;
class Ray {
	Point		o;
	Vector		d;

	float		mint;
	float		maxt;

	__host__ __device__ Ray(const Point& o, const Vector& d, float mint = 1e-6f, float maxt = INFINITY);
	__host__ __device__ Point		operator()(float t) const;

These are the error messages:

CUDA context created : 042e3c70
CUDA module loaded:   05516e38 C:/Users/Arjan/documents/visual studio 2012/Projects/CUDATest/CUDATest/
CUDA Debugger detected HW exception on 2 warps.  First warp:
blockIdx = {0,0,0}
threadIdx = {0,0,0}
Exception = Out of range Address
PC = 0x0004e858
FunctionRelativePC = 0x00000398

CUDA context created : 00513c70
CUDA module loaded:   05226e38 C:/Users/Arjan/documents/visual studio 2012/Projects/CUDATest/CUDATest/
CUDA Memory Checker detected 128 threads caused an access violation:
Launch Parameters
    CUcontext    = 00513c70
    CUstream     = 0519af40
    CUmodule     = 05226e38
    CUfunction   = 052b6120
    FunctionName = _Z9TraceRaysPK6CameraPK5ShapeP5Colorj
    GridId       = 6
    gridDim      = {128,96,1}
    blockDim     = {8,8,1}
    sharedSize   = 256
        cam = 0x40b80000  {}
        shape = 0x40b80200  {__vptr = 0x013e834c  ???}
        result = 0x40c80000  {r = 0, g = 0, b = 0}
        width = 1024
    Parameters (raw):
         0x40b80000 0x40b80200 0x40c80000 0x00000400

Does anybody know why I get these invalid address on load errors? Am I incorrectly allocating memory for the Sphere object?

Thank you in advance!

The problem is that you are creating the Sphere object on the host, and then trying to copy it to do the device.

new Sphere(Point(0.f, 0.f, 0.f), 2.f);

You need to create it explicitly on the device (launch a kernel that calls new).

It fails because Sphere is a virtual class, and the host compiler implements this by embedding a pointer
to a host function in the Sphere object (__vptr in the memory checker trace). When you copy the Sphere
to the device, the pointer gets copied over. When you try to call the Intersect method on the Sphere it
tries to access the host point, which fails because it is a host pointer.

If you created the Sphere on the device, the embedded pointer would be allocated on the device and
it would work.