I’m trying to convert a path tracer to use CUDA instead of running on the CPU. On the way, I’ve run into a problem. I keep getting access violations and I’m not sure why.
I have a kernel called TraceRays:
__global__ void TraceRays(const Camera* cam, const Sphere* shape, Color* result, unsigned width) {
const unsigned x = blockIdx.x * blockDim.x + threadIdx.x;
const unsigned y = blockIdx.y * blockDim.y + threadIdx.y;
const unsigned i = y*width + x;
const Ray ray = cam->GetRay(x, y);
Point p = ray(0.5f);
result[i] = Color();
float t;
if (shape->Intersect(ray, t))
result[i] = Color(0.f, 1.f, 0.f);
}
NSight’s memory checker tells me that I specify an invalid address on load, at the “if (shape->Intersect(ray, t))” line. However, checking the values with the debugger shows that shape has the correct position and radius.
This is how I allocate them:
Camera* cam = new Camera(...);
Sphere* sphere = new Sphere(Point(0.f, 0.f, 0.f), 2.f);
Camera* d_cam; cudaMalloc(&d_cam, sizeof(Camera));
Sphere* d_sphere; cudaMalloc(&d_sphere, sizeof(Sphere));
cudaMemcpy(d_cam, cam, sizeof(Camera), cudaMemcpyHostToDevice);
cudaMemcpy(d_sphere, sphere, sizeof(Sphere), cudaMemcpyHostToDevice);
To see if it might be that, somehow, it will not call methods of objects on the device, I added that “Point p = ray(0.5f);” line. This seems to be working just fine.
This is how my classes are declared:
class Sphere : public Shape {
public:
Point c;
float r;
Sphere();
Sphere(const Point& c, float r);
__host__ __device__ bool Intersect(const Ray& ray, float& t) const;
__host__ __device__ Vector GetNormal(const Point& p) const;
};
class Ray {
public:
Point o;
Vector d;
float mint;
float maxt;
__host__ __device__ Ray(const Point& o, const Vector& d, float mint = 1e-6f, float maxt = INFINITY);
__host__ __device__ Point operator()(float t) const;
};
These are the error messages:
CUDA context created : 042e3c70
CUDA module loaded: 05516e38 C:/Users/Arjan/documents/visual studio 2012/Projects/CUDATest/CUDATest/camera.cu
CUDA Debugger detected HW exception on 2 warps. First warp:
blockIdx = {0,0,0}
threadIdx = {0,0,0}
Exception = Out of range Address
PC = 0x0004e858
FunctionRelativePC = 0x00000398
CUDA context created : 00513c70
CUDA module loaded: 05226e38 C:/Users/Arjan/documents/visual studio 2012/Projects/CUDATest/CUDATest/camera.cu
================================================================================
CUDA Memory Checker detected 128 threads caused an access violation:
Launch Parameters
CUcontext = 00513c70
CUstream = 0519af40
CUmodule = 05226e38
CUfunction = 052b6120
FunctionName = _Z9TraceRaysPK6CameraPK5ShapeP5Colorj
GridId = 6
gridDim = {128,96,1}
blockDim = {8,8,1}
sharedSize = 256
Parameters:
cam = 0x40b80000 {}
shape = 0x40b80200 {__vptr = 0x013e834c ???}
result = 0x40c80000 {r = 0, g = 0, b = 0}
width = 1024
Parameters (raw):
0x40b80000 0x40b80200 0x40c80000 0x00000400
Does anybody know why I get these invalid address on load errors? Am I incorrectly allocating memory for the Sphere object?
Thank you in advance!