No Texture Fetches

Hi all,

Apologies if this is the wrong place to ask this, or if I’ve just done something silly. I’m not exactly sure what I’m seeing when I look at the texture cache statistics for a kernel in Nsight VSE. In my kernel code I’ve verified that I’m stepping into the same region of code that reads from a 3D texture. See the below code, which has been stripped down to make more legible.

Note that in the full code there is an output, so I believe the compiler isn’t compiling out the texture read. I have also verified that the coordinates that are used for the texture read are valid.

__global__ void _cudaBrickingRaycaster_RenderBrick_(BrickJob* b, Camera c, float* resultForForcing)
	int ray_segments_per_thread = b->num_ray_segments / THREADS_PER_BLOCK_X;
	if ((b->num_ray_segments % THREADS_PER_BLOCK_X) != 0)

	int ray_index = ((blockIdx.x * blockDim.x) + threadIdx.x) * ray_segments_per_thread;
	int ray_index_limit = min(b->num_ray_segments, ray_index + ray_segments_per_thread);
	float sample;
	float result = 0.0f;

	while (ray_index < ray_index_limit)
		glm::vec3 ray = _cuda_camera_ray(c, b->ray_segments[ray_index].x, b->ray_segments[ray_index].y);
		glm::vec3 sample_point = (ray * (STEP_SIZE * (float)b->ray_segments[ray_index].start_sample)) + c.position;
		glm::vec3 step_vector = ray * STEP_SIZE;

		sample_point -= b->brick_offset;
		int num_samples = b->ray_segments[ray_index].end_sample - b->ray_segments[ray_index].start_sample;
		for (int i = 0; i < num_samples; ++i)
			sample = tex3D<float>(b->brick_texture.tex, sample_point.x, sample_point.y, sample_point.z);
			result += sample;
			sample_point += step_vector;


My issue is that I’m seeing 0 texture fetches from the kernel, but 731M load transactions from the Texture Cache, and 34M bank conflicts in the texture cache. I’m not sure why I’m seeing 0 texture fetches though.

See this image:

Thanks for any help you can provide.