Hi there,
I’ve been using the memcheck tool while developing a ray tracer from scratch.
This early on, Im already getting stuck because I keep getting random errors from the memcheck tool.
Im trying to ray trace a hardcoded (in kernel code) sphere using a moveable camera…
__global__ void cudaRender(
float4* screen,
int pitch,
const Camera::CameraData* camera,
MyStruct* list)
{
// Calculate the global x and y coordinates of this thread + its global (1 dimensional) id.
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int gid = y * pitch + x;
// Generate primary ray
glm::vec2 pixelPosition(x, y);
Ray ray = Camera::generateRay(*camera, pixelPosition);
// Setup hardcoded sphere for the demo
Sphere demoSphere;
demoSphere.position = glm::vec3(0, 0, -5);
demoSphere.radius2 = 4.0f;
// Intersect with the sphere, if we hit we set the output colour to green
glm::vec3 outCol = glm::vec3(0);
if (demoSphere.intersect(ray))
outCol = glm::vec3(0, 1, 0);
// Write the output color to the screen buffer
screen[gid].x = outCol.x;
screen[gid].y = outCol.y;
screen[gid].z = outCol.z;
screen[gid].w = 1.0f;
}
// A GPU function that generates a ray for a given pixel
__device__ Ray Camera::generateRay(const CameraData& data, glm::vec2 pixelPosition)
{
// The camera's u, v vectors span the virtual screen. Their length is depends
// on the aspect ratio and Field of View.
// Calculate the point on the virtual screen through which the ray passes
// See slides 17-20 of:
// http://www.cs.uu.nl/docs/vakken/magr/2016-2017/slides/lecture%2001%20-%20intro%20&%20whitted.pdf
glm::vec2 uv = (pixelPosition / data.screenSize) * 2.0f - 1.0f;
glm::vec3 p = data.screenCenter + uv.x * data.u + uv.y * data.v;
Ray primaryRay;
primaryRay.origin = data.cameraPosition;
primaryRay.direction = glm::normalize(data.cameraPosition - p);
primaryRay.t = 10e+10f;// std::numeric_limits is not supported on device code
return primaryRay;
}
__device__ bool Sphere::intersect(Ray& ray)
{
// Efficient ray/sphere intersection code by Jacco Bikker.
// Presented during the first lecture (slide 21) of Advanced Graphics course (2016-2017)
// at the Utrecht University):
// http://www.cs.uu.nl/docs/vakken/magr/2016-2017/index.html
// http://www.cs.uu.nl/docs/vakken/magr/2016-2017/slides/lecture%2001%20-%20intro%20&%20whitted.pdf
glm::vec3 center = position - ray.origin;// Vector from ray origin to sphere center
float t = glm::dot(center, ray.direction);// Projection of center vector onto the ray direction
glm::vec3 q = center - t * ray.direction;// Component of center vector orthogonal to the ray direction
float p2 = glm::dot(q, q);// Length of q square
if (p2 > radius2)
return false;
t -= sqrtf(radius2 - p2);
if ((t < ray.t) && (t > 0))
{
ray.t = t;
return true;
}
return false;
}
struct CameraData
{
glm::vec3 cameraPosition;// "eye" vector (camera's position in world space)
glm::vec3 screenCenter;// Center of the screen plane in world space
glm::vec3 u, v;// Vectors that define the virtual screen plane in world space
glm::vec2 screenSize;// Screen size in pixels
};
The program runs fine without memcheck enabled (no freezes/system crashes).
But when I enable memcheck it keeps giving me errors in random locations in the math (glm vector math) code (both intersection as well as ray generation).
These errors are not thrown at the first invocation of the kernel but sometimes appear only after a couple of seconds.
The camera data is read only (using const pointers) and is only changed in between kernel invocations using cudaMemcpy.
Im almost a 100% sure this is a bug in the memcheck tool as some of the messages really make no sense at all:
Some of the error messages are:
Memory Checker detected 64 access violations.
error = misaligned load (global memory)
gridid = 1
blockIdx = {20,124,0}
threadIdx = {0,0,0}
address = 0x1b8e1fffc60
accessSize = 4
Memory Checker detected 31 access violations.
error = access violation on store (shared memory)
gridid = 31
blockIdx = {9,160,0}
threadIdx = {1,0,0}
address = 0x1dc00fffc40
accessSize = 4
Both of which make no sense to me at all because 0x1b8e1fffc60 is aligned to 32 bytes (and a glm::vec3 are separate floats anyways).
Second of all, Im not using shared memory anywhere in my code so how can I ever get an access violation in shared memory?
The memcheck tool breaks in the glm/helper_math (tried both) operator overloads.
In the case of the second error, according to VS watch window, all variables in the function are locals (and none are in shared memory).
Even when I changed my code to only upload the camera data once at the start, I got the same errors.
This is really weird since its basically doing the same calculations every frame, which means it should either crash at the first invocation, or not at all.
For math Im using glm although I did switch to float2/3/4 with helper_math.h (from the CUDA samples) but this didnt help either.
I reinstalled the driver, cuda toolkit and nsight visual studio plugin.
Driver: 378.66
Nsight: 5.2.0.16268
CUDA: 8.0.61
OS: Windows 10 Pro
GPU: GTX1050
This problem also occurs on my laptop (GTX1050 as well).