Debugger error.

I’m writing a simple ray tracer and encountered a strange issue while testing. For some reason, I decided to switch to debug mode(in VS 2015) and and the output picture became really glitchy and nothing like what I was getting in Release mode (Release build gives me a half-decent image, while Debug generated something that barely resembled the shapes it was supposed to and those with a lot of missing parts and no reflections). Is there any difference in floating point arithmetic or anything like that between the two modes?

The part of the code that’s probably is “misbehaving” is this:

template<typename HitType>
struct PixelRenderFrame {
	Photon photon;
	typename ShadedOctree<HitType>::RaycastHit hit;
	ColorRGB color;
	ShaderBounce bounce;
	int bounceId;
template<typename HitType, unsigned int MaxStackSize>
struct PixelRenderProcess {
	SceneDataHandles<HitType> world;
	PixelRenderFrame<HitType> stack[MaxStackSize];
	PixelRenderFrame<HitType> *end;
	PixelRenderFrame<HitType> *ptr;
	Pixel::PixelColor output;
	bool renderComplete;

	bool shadeStarted;
	bool midShade;
	Photon savedPhoton;
	int lightId;

	int castsLeft;

	__device__ __host__ inline bool shade() {
		PixelRenderFrame<HitType> &frame = (*ptr);
		if (!shadeStarted) {
			if (castsLeft <= 0) {
				midShade = true;
				return true;
			else {
				midShade = false;
				shadeStarted = true;
			if (frame.photon.dead()) return false;
			if (!>cast(frame.photon.ray, frame.hit, false)) return false;

			frame.color(0.0f, 0.0f, 0.0f);

			if ((ptr + 1) != end) {
				ShaderBounceInfo<HitType> bounceInfo = { frame.hit.object.object, frame.photon, frame.hit.hitPoint };
				frame.hit.object.material->bounce(bounceInfo, &frame.bounce);
			else frame.bounce.count = 0;
			frame.bounceId = 0;

		const Stacktor<Light> &lights = (*world.lights);
		for (int i = lightId; i < lights.size(); i++) {
			bool noShadows;
			Photon p;
			if (midShade) {
				noShadows = false;
				p = savedPhoton;
			else {
				noShadows = false;
				p = lights[i].getPhoton(frame.hit.hitPoint, &noShadows);
				if (p.dead()) continue;
			if (!noShadows) {
				if (castsLeft > 0) {
					typename ShadedOctree<HitType>::RaycastHit lightHit;
					if (>cast(p.ray, lightHit, false)) {
						if ((frame.hit.hitPoint - lightHit.hitPoint).sqrMagnitude() <= 128.0f * VECTOR_EPSILON)
							noShadows = true;
					else noShadows = true;
					midShade = false;
				else {
					midShade = true;
					savedPhoton = p;
					lightId = i;
					return true;
			if (noShadows) {
				ShaderHitInfo<HitType> castInfo = { frame.hit.object.object, p, frame.hit.hitPoint, frame.photon.ray.origin };
				frame.color += frame.hit.object.material->illuminate(castInfo).color;
		frame.color *= frame.photon.color;

		shadeStarted = false;
		midShade = false;
		lightId = 0;
		return true;

	__device__ __host__ inline bool iterate() {
		while (true) {
			if (midShade || (ptr->bounceId < ptr->bounce.count)) {
				if (!midShade) {
					Photon sample = ptr->bounce.samples[ptr->bounceId];
					sample.ray.origin += sample.ray.direction * (128.0f * VECTOR_EPSILON);
					ptr->photon = sample;
				if (!shade()) {
					if (ptr != stack) ptr--;
					else {
						output.depth = -1;
						renderComplete = true;
						return true;
				else if (midShade) return false;
			else if(ptr == stack) {
				output.color = ptr->color;
				output.depth = ptr->hit.hitDistance;
				renderComplete = true;
				return true;
			else {
				ColorRGB col = ptr->color;
				ptr->color += col;

	__device__ __host__ inline void setup(const Photon &photon, const SceneDataHandles<HitType> &world) {
		this->world = world;
		end = (stack + MaxStackSize);
		ptr = stack;
		ptr->photon = photon;
		shadeStarted = false;
		midShade = true;
		lightId = 0;
		renderComplete = false;
		castsLeft = 0;

// Code below is temporary and will be changed in the future
template<typename HitType, unsigned int MaxStackSize>
__device__ __host__ inline static Pixel::PixelColor renderPixel(const Photon &photon, const SceneDataHandles<HitType> &world, PixelRenderProcess<HitType, MaxStackSize> *stack) {
	PixelRenderProcess<HitType, MaxStackSize> pixel;
	pixel.setup(photon, world);
	while (!pixel.renderComplete) {
		pixel.castsLeft = 1; // this variable will be used to limit raycasts per kernel in the future, to prevent crashes.
	return pixel.output;

There can be numerical differences between debug and release mode builds. For example, the contraction of floating-point multiplies and adds into FMA (fused multiply-add) instructions is an optimization. In debug mode all optimizations are turned off. Use of the contraction can often change numerical results since use of FMA reduces rounding errors and preserves bits in cases of subtractive cancellation, leading to more accurate results.

What you describe doesn’t sound like a consequence of numerical differences, though, it’s more consistent with race conditions in the code that get exposed due to different code timing.

When you run your code under control of cuda-memcheck, are any issues reported? Note that cuda-memcheck cannot find all race conditions.

Thanks for the information about FMA. I had some problems with floats being not accurate enough some time ago and didn’t know about optimizations like that.
I don’t think the problem has to do with race conditions, because every variable that is shared between threads remains intact during the whole process and each thread has it’s own instance of the struct mentioned above on stack and each updates only one pixel(at least for now).
For comparison, you can see the image I attached to the post. the one on the left is the debug mode output and the one that shows the image properly is my Production build.

I checked with cuda-memcheck and it didn’t detect any errors in Release build, while Debug simply crashed after reaching the timeout (that 5 or so second limit for kernels; since my test renders over and over again before I turn the window off, changing the timeout won’t give me a chance to test anyway). We may assume, it’s not a memory issue.

Changed the kernel size and Debug test didn’t crash this time when run under cuda-memcheck. No errors detected, just as expected and the image still looks terrible.
I’ll simply settle down with the idea, that is’s a numerical issue (Octree<> class, that’s responsible for raycasts uses functions, that allow a really small error(0.0.000005f) in most of the calculations and likely, the debug mode’s lack of optimizations causes cast() to fail and produce random results. But this still seems strange, considering the test of the Octree itself, that once again renders some things(without lights and/or reflections/refractions) works just fine in both modes).