Strange texture fetch behavior (release vs debug)

roastam · March 20, 2017, 7:48pm

I recently added support for transparency in the pathtracer I’ve been working on.
The following piece of code is handles transparency in the ray/triangle intersection routine.

CODE 1:

if ( alphaMasked() ) { 
//The type of _device_texture_indices_texobj is cudaTextureObject_t and it resides in the constant memory.
	const uint4 tex_idx = tex1Dfetch<uint4>( _device_texture_indices_texobj, tri_idx );
	// the forth component of the tex_idx is the index of the alpha mask image texture
	// tex_object is array of cudaTextureObject_t referencing all image textures (diffuse, spec, bump, and alpha)
	const float alpha = tex2D<float4>( tex_objects[tex_idx.w], uv.x, 1.0f - uv.y ).x;
	if ( alpha != 1.0f ) {return false;}
}

tex_idx contains indices for diffuse, specular, bump, and alpha image textures per triangle.
I made a simple tree leaf as a test case.
The leaf is modeled as a quad (two triangles) with a diffuse and an alpha texture only.

The tex_idx for both triangles looks like:
tex_idx.x = 1 (diffuse)
tex_idx.y = 0 (no specular)
tex_idx.z = 0 (no bump)
tex_idx.w = 2 (alpha)
for the leaf test case tex_idx.x and tex_idx.w are guaranteed to be 1 and 2 respectively.
The following is the correct rendering for the leaf test case:

https://drive.google.com/open?id=0B2Cl1FjxGIekVnpmaWdlc3RDbE0

CODE 1 (in debug build) produces the correct redering.

CODE 1 (in release build) produces the following incorrect rendering:
External Media
https://drive.google.com/open?id=0B2Cl1FjxGIekWlJtYV9Ra1NFbzQ
(wrong transparency for the second triangle)

However, the above code works fine if I hard code the index for the alpha mask (notice change on line 6):
tex_objects[tex_idx] is changed to tex_objects[2]
CODE 2:

if ( alphaMasked() ) { 
	//The type of _device_texture_indices_texobj is cudaTextureObject_t and it resides in the constant memory.
	const uint4 tex_idx = tex1Dfetch<uint4>( _device_texture_indices_texobj, tri_idx );
	// the forth component of the tex_idx is the index of the alpha mask image texture
	// tex_object is array of cudaTextureObject_t referencing all image textures (diffuse, spec, bump, and alpha)
	const float alpha = tex2D<float4>( tex_objects[2], uv.x, 1.0f - uv.y ).x;
	if ( alpha != 1.0f ) {return false;}
}

Here’s the interesting part. I added a simple check to print an error message if tex_idx.w is not 2. The message on
line 4 is never printed and incorrect rendering is displayed.
CODE 3:

if ( alphaMasked() ) { 
	//The type of _device_texture_indices_texobj is cudaTextureObject_t and it resides in the constant memory.
	const uint4 tex_idx = tex1Dfetch<uint4>( _device_texture_indices_texobj, tri_idx );
	if (tex_idx.w != 2) printf(" ERROR tex_idx=%u\n", tex_idx.w);
	// the forth component of the tex_idx is the index of the alpha mask image texture
	// tex_object is array of cudaTextureObject_t referencing all image textures (diffuse, spec, bump, and alpha)
	const float alpha = tex2D<float4>( tex_objects[2], uv.x, 1.0f - uv.y ).x;
	if ( alpha != 1.0f ) {return false;}
}

As I mentioned, the original CODE 1 creates the correct rendering in Debug mode. No shared memory is used anywhere in the code.

Even though shared memory is never used, the behavior I described above still points to sync
issues somewhere, so for the heck of it I added __syncthreads() to CODE 1 and it “fixed” the problem.
CODE 5:

if ( alphaMasked() ) { 
	//The type of _device_texture_indices_texobj is cudaTextureObject_t and it resides in the constant memory.
	const uint4 tex_idx = tex1Dfetch<uint4>( _device_texture_indices_texobj, tri_idx );
    __syncthreads()
	// the forth component of the tex_idx is the index of the alpha mask image texture
	// tex_object is array of cudaTextureObject_t referencing all image textures (diffuse, spec, bump, and alpha)
	const float alpha = tex2D<float4>( tex_objects[tex_idx.w], uv.x, 1.0f - uv.y ).x;
	if ( alpha != 1.0f ) {return false;}
}

I’m still baffled by why sync is somehow fixing the issue. I don’t believe __syncthreads() is the correct solution in this case.
Any ideas would greatly be appreciated.

The following is a description of my hardware and build environment:
GPU: Nvidia Titan Z (Kepler) driver 378.66 (same issue with driver that came with CUDA 8 toolkit)
OS: Windows 10
CUDA: 8.0
IDE: Visual Studio 2015
Microsoft Visual Studio Community 2015 Version 14.0.25431.01 Update 3

roastam · August 4, 2017, 2:57am

Just in case someone else has encountered the same problem:

I submitted a but report to the CUDA support team, in March 22nd of 2017. The bug is fixed (I tested it) in CUDA toolkit 9.0 rc.

Topic		Replies	Views
Strange texture fetch behavior (release vs debug) OptiX	2	901	June 14, 2022
texture latency and sync strange behavior CUDA Programming and Performance	8	9847	June 14, 2007
Is texture fetching thread-safe? Error found in texture fetching, output and code included CUDA Programming and Performance	5	1165	October 17, 2010
Texture fetches bug I hope this complete report helps CUDA Programming and Performance	5	2910	June 19, 2008
Texture objects fetching with unknown texture address CUDA Programming and Performance	0	781	February 24, 2015
error : "too many resources requested for launch" CUDA Programming and Performance	18	3312	January 16, 2014
memory allocation bug? CUDA Programming and Performance	1	1882	March 27, 2007
Bug in nvcc, incorrect sign-extend for textures Incorrect sign-extend when the results of one textur CUDA Programming and Performance	3	2889	May 5, 2009
Incorrect synchronization inside a "while" loop (occuring only in Release mode) CUDA Programming and Performance	10	1722	March 28, 2015
Multiple runs, different results CUDA Programming and Performance	3	1546	May 14, 2015

Strange texture fetch behavior (release vs debug)

Related topics