does tex2D always returns vector 4 data?

Hella_Yu · March 25, 2008, 1:31am

In the prgoram, texL is binded to 2D array, each element of array is one float.
In ptx code, it’s found that loading one element from texture texL(ti, tj)
is
tex.2d.v4.f32.f32 {$f10, $f11, $f12, $f13},texL, {$f6, $f7, $f8, $f9}
and only $f10 has the data needed, $f11~$f13 are useless.

Does this mean texture fetch is most efficient when loading vector 4 data a time, and the above waste 3/4 of the efficiency?
thanks.

nwilt · March 25, 2008, 1:50am

No, if a 4-tuple is not being fetched, the registers are available for other uses. The compiler should generate code accordingly.

Hella_Yu · March 25, 2008, 3:12am

but I mean, do they have to use 4 registers for each fetch, even if three registers are not effective?

Also, accessing texture memory bound to 2D array seems to involve several extra instructions for computing address in ptx code, is there a way to get rid of that?

MisterAnderson42 · March 25, 2008, 11:44am

The register optimizing step compiler will automatically get rid of them.

Hella_Yu · March 25, 2008, 2:17pm

So they will be kept in ptx code , but removed in binary executables, is that correct?

Because now I want to estimate the performance bottlebeck , I’m not sure is every instruction in the ptx code should be counted as one instruction in real execution.

Thanks

Simon_Green · March 25, 2008, 4:41pm

No, PTX to hardware code conversion and optimization happens at run-time.

I wouldn’t worry too much about the PTX code - don’t forget premature optimization is the root of all evil.

seibert · March 25, 2008, 4:43pm

You should take a look at decuda (written by “wumpus” in the forums):

[url=“http://www.cs.rug.nl/~wladimir/decuda/”]http://www.cs.rug.nl/~wladimir/decuda/[/url]

This tool allows you to disassemble the cubin output of ptxas and see what actual instructions and register allocations are being used in the binary uploaded to the card.

Topic		Replies	Views
Compiler optimization for texture fetch? unroll texture fetch. CUDA Programming and Performance	2	3958	March 8, 2009
compiler generating correct ptx texture load? ushort generates u32 load CUDA Programming and Performance	3	5215	December 15, 2007
How to reduce register usage CUDA Programming and Performance	47	50033	May 28, 2022
Weird use of registers Too many registers are wasted CUDA Programming and Performance	8	5568	July 4, 2007
Is there a way to control instruction ordering? (and what is the difference between TEX.T and TEX.P? CUDA Programming and Performance	8	2283	April 28, 2014
cost of tex1Dfetch miss ? CUDA Programming and Performance	2	1801	January 30, 2009
ptx optimization CUDA Programming and Performance	3	1211	May 30, 2009
register usage according to the ptx file CUDA Programming and Performance	3	4312	June 26, 2009
Register usage and .ptx files CUDA Programming and Performance	2	7042	October 12, 2007
Understanding LDSM.16.M88.4 CUDA Programming and Performance	6	834	June 8, 2024

does tex2D always returns vector 4 data?

Related topics