multiple texture fetch bandwidth

flossy · April 21, 2008, 7:35am

Hello !

I was delighted by the bw_test.cu program being able to get near optimal bandwidth from the G80 GPU with a very simple kernel, i.e.:

// const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;
// g_odata[idx] = tex1Dfetch(tex_float, idx); → 73GiB/s !

now i want to do the same with my slightly less simple kernel, all texture references and simply bound to cudaMalloc’ed data blocks of approx 18M floats each:

#define THREADS_PER_BLOCK 384

global void cuda_pointop(float *a, float *b, float *c, float *d, int n)
{
const unsigned int idx = THREADS_PER_BLOCK * blockIdx.x + threadIdx.x; // points

float a_in = tex1Dfetch(a_texref, idx);
float b_in = tex1Dfetch(b_texref, idx);
float c_in = tex1Dfetch(c_texref, idx);
float d_in = tex1Dfetch(d_texref, idx);

if (idx < n)
{
float a_out = a_in - b_in + c_in * d_in;

if (a_out < 0.0) a_out = 0.0f;

a[idx] = a_out;

}
}

I only get about half (37GiB/s) ?

Why ?
Can anyone help me get more bandwidth ?

Thanks!

Phil.

MisterAnderson42 · April 21, 2008, 2:06pm

Pack your data into a float4 texture and read that, you should be able to attain optimal bandwidth again.

Although, I have no sufficient explanation as to why 4 float texture reads is slower (I have observed this behavior, too). It may be because there are limited texture addressing units on the hardware. If that is true, then G9x hardware should perform better in situations like this: I’ll check out of my own curiosity as soon as my G9x box is up and running.

Topic		Replies	Views
Internal bandwidth How to read at full speed CUDA Programming and Performance	0	3871	April 25, 2007
How to get peak rate with simple opeartion Question about performance optimization CUDA Programming and Performance	17	13796	June 2, 2008
a question about a strange performance degradation using texturing CUDA Programming and Performance	0	3731	February 17, 2011
texture access performance issues CUDA Programming and Performance	7	3122	December 19, 2007
Texture and L1 memory bandwidth CUDA Programming and Performance	14	9933	December 14, 2011
Multiple textures vs Single Multichannel texures Which is faster? CUDA Programming and Performance	8	9603	May 27, 2008
Maximum bandwith? CUDA Programming and Performance	4	4494	April 16, 2008
Texture question CUDA Programming and Performance	11	4653	June 3, 2009
Bandwidth calculation Newbie question... CUDA Programming and Performance	10	5545	August 1, 2008
GTX 470 performance gains too low ? (texture operations) CUDA Programming and Performance	16	11175	April 22, 2010

multiple texture fetch bandwidth

Related topics