Coalesced access for misaligned float

in the 87 page of Cuda Programming guide version 2.3.1
there are following explanation.
" center : misaligned float memory access reulting in one transaction." for the Cuda 1.2 or higher

but in my case the following code gives the results that says misaligned float memory access resulting in 16 transaction.
global void offsetcopy( float odata, float idata, int offset)
{
int xid=blockIdx.x*blockDim.x +threadIdx.x +offset;
odata[xid]=idata[xid]
}
for offset=1,2,…15 , 8x performance degradation arise.
this means 16 transactions are issued per half warp.

what happens for this misaligned memory access ?
is programming guide wrong?

But that isn’t what it says. It says “Examples of Global Memory Access by Devices with Compute Capability 1.2 and Higher” [emphasis mine]. My guess is your card is a compute 1.1 capability card (GF8xxx, GF9xxx, GTS1xx, GTS2xx or the Quadro/Telsa equivalent).

thank you so much !

i misunderstand the meaning of “Compute capability 1.2 or higher” .
i thought that just means the version of Cuda(such as cuda v2.3…)…^^;;

my graphic card is 9500GT … so Compute capability 1.1 .
thanks again ~/

ps. what vga card can support Compute capability 1.2 or higher?

Right now only the GTX200 desktop series (and Quadro equivalents) and the Tesla C1060/S1070 series are compute 1.3 capable. There have been some compute 1.2 mobile parts announced, but they probably won’t be in anything you can buy until the end of the year.

Appendix A of the CUDA 2.3 Programming Guide contains a complete (as of the publication of the guide) list of devices and compute capabilities.