Hi all,
Apologies if this is the wrong place to ask this, or if I’ve just done something silly. I’m not exactly sure what I’m seeing when I look at the texture cache statistics for a kernel in Nsight VSE. In my kernel code I’ve verified that I’m stepping into the same region of code that reads from a 3D texture. See the below code, which has been stripped down to make more legible.
Note that in the full code there is an output, so I believe the compiler isn’t compiling out the texture read. I have also verified that the coordinates that are used for the texture read are valid.
__global__ void _cudaBrickingRaycaster_RenderBrick_(BrickJob* b, Camera c, float* resultForForcing)
{
int ray_segments_per_thread = b->num_ray_segments / THREADS_PER_BLOCK_X;
if ((b->num_ray_segments % THREADS_PER_BLOCK_X) != 0)
ray_segments_per_thread++;
int ray_index = ((blockIdx.x * blockDim.x) + threadIdx.x) * ray_segments_per_thread;
int ray_index_limit = min(b->num_ray_segments, ray_index + ray_segments_per_thread);
float sample;
float result = 0.0f;
while (ray_index < ray_index_limit)
{
glm::vec3 ray = _cuda_camera_ray(c, b->ray_segments[ray_index].x, b->ray_segments[ray_index].y);
glm::vec3 sample_point = (ray * (STEP_SIZE * (float)b->ray_segments[ray_index].start_sample)) + c.position;
glm::vec3 step_vector = ray * STEP_SIZE;
sample_point -= b->brick_offset;
int num_samples = b->ray_segments[ray_index].end_sample - b->ray_segments[ray_index].start_sample;
for (int i = 0; i < num_samples; ++i)
{
sample = tex3D<float>(b->brick_texture.tex, sample_point.x, sample_point.y, sample_point.z);
result += sample;
sample_point += step_vector;
}
++ray_index;
}
}
My issue is that I’m seeing 0 texture fetches from the kernel, but 731M load transactions from the Texture Cache, and 34M bank conflicts in the texture cache. I’m not sure why I’m seeing 0 texture fetches though.
See this image: http://i.imgur.com/s9TCCzm.png
Thanks for any help you can provide.