Issues with nvcc flag -arch when using anything >= sm_20 (program crashing)

Hi,

I am currently having an issue with my path tracer crashing at runtime when it is compiled with the arch flag combined with anything >= sm_20. I have GTX Titan in my machine and compiled with -arch=sm_35 until recently and everything worked fine.

So this is what I did before the error occured:
In the renderer, I want to store all sampled points as some kind of light field samples. This is achieved using the following data type:

class UserType_lightField {
public:
	optix::float4 m_xyzt[LF_DEPTH_WRITE];	// xyz and theta
	optix::float4 m_rgbp[LF_DEPTH_WRITE];	// rgb and phi
	optix::float4 m_ns[LF_DEPTH_WRITE];     // normal and shininess
	bool m_valid[LF_DEPTH_WRITE];
};

LF_WRITE_DEPTH is simply the maximum number of hitpoints that are stored. Now in my ray generation program, I create a UserType_lightField object and store the according data in it, looking something like this:

if (!prd.background) {
	lightField.m_valid[lf_depth] = true;
	lightField.m_xyzt[lf_depth].w = acosf( rayInDir.z );
	lightField.m_rgbp[lf_depth].w = atan2f( rayInDir.y, rayInDir.x );
					
	lightField.m_xyzt[lf_depth].x = prd.origin.x;
	lightField.m_xyzt[lf_depth].y = prd.origin.y;
	lightField.m_xyzt[lf_depth].z = prd.origin.z;
 					
	lightField.m_ns[lf_depth].x = prd.normal.x;
	lightField.m_ns[lf_depth].y = prd.normal.y;
	lightField.m_ns[lf_depth].z = prd.normal.z;
	lightField.m_ns[lf_depth].w = prd.shininess;
 			
	lightField.m_rgbp[lf_depth].x = prd.texCol.x * prd.radiance.x;
	lightField.m_rgbp[lf_depth].y = prd.texCol.y * prd.radiance.y;
	lightField.m_rgbp[lf_depth].z = prd.texCol.z * prd.radiance.z;
}

Also, the data is accumulated for each of these points, i.e., the illumination from the points after a specific point is added to this point (just like a normal path tracer does it with the sample value). At the end of the ray generation program, this data is stored in a rtBuffer<UserType_lightField, 2> output_buffer_lightField;

Now for the problem:
Everything works fine when compiling without the -arch flag or with -arch up to sm_13. With anything above sm_20, the application crashes shortly after start:

Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
[Switching focus to CUDA kernel 225, grid 226, block (47,0,0), thread (21,1,0), device 0, sm 8, warp 26, lane 21]
0x000000000599aa00 in __globfunc__Z7trace_0v<<<(56,1,1),(32,8,1)>>> ()

Kernel and grid seem to be the same everything the program crashes, everything else varies. Somehow this seems to be related to how much data I access (writing to it) in my ray generation program, as just eliminating some writes (also to data unrelated to the light field samples) makes the error disappear.

Please, does someone have any advice on this? Really seems to be a weird error to me and I have no idea what’s wrong at the moment.

Just noted that when I set the program to use just one GPU (I have two Titans in my machine), the program runs for a few seconds and then freezes -> launch timeout.

Updated to OptiX 3.6.2 and CUDA 6.0, now instead of a crash I get a black screen with >= sm_20…this happens with gcc 4.6 as well as gcc 4.8

Nevermind, I overlooked that I set wrong array bounds in one place, which resulted in this error. Funny though that it worked with <= sm_13 :)

Okay, I have to take this back. As soon as I changed the configuration to store samples until a higher depth (e.g. the first four hitpoints), the problem occured again. Still working with <= sm_13, crashing with anything higher.

What kind of error message are you getting? Is the problem still with an illegal device address or is it a timeout issue? If the latter, there are plenty of posts that can help you solve it on this board, for instance: Optix Error: Kernel Launch returned.