CUDA texturing bug (or not all cases covered in the programming guide)

Consider this minimalistic repro program:

texture<float, 1, cudaReadModeElementType> tex;

void myWrapper (void)

{

	textureReference *texRef;

	cudaGetTextureReference(&texRef, "tex");

	// etc, continue binding that texture and launching a kernel

}

This is a direct port from the programming guide (version 2.2, first code sample on page 30 (pdf page 38)), using the so-called low-level API.

Assuming this is saved to a file “repro.cu”.

Compile with

nvcc -I/usr/local/cuda/2.2/include -g --host-compilation C --ptxas-options=-v -arch sm_13 -c repro.cu -o repro.o

and nvcc (gcc, actually) spits out a warning:

argument of type “textureReference **” is incompatible with parameter of type “const textureReference **”

Compile with --host-compilation C++

and the warning becomes an error.

The fix is trivial, declare the texture reference as const. This is on 32-bit OpenSuSE 11.1 with CUDA 2.2, GTX280, and the official gcc that ships with the distro. It would be nice if a future release of the progguide would not lead to this confusion.

In related news, here’s a few more pointers to stuff that might require attention in future releases of the documentation:

The reference manual states, for (C-API) cudaBindTexture():

If the device memory pointer was returned from cudaMalloc(), the offset is guaranteed to be 0 and NULL may be passed as the offset parameter.

This is not correct anymore once some index arithmetic is going on, which is officially supported. I learned this the hard way, in my app, the offset is nonzero only for “selected blockDims”. Suggested fix:

If the device memory pointer returned from cudaMalloc() has not been modified by the application, the offset is guaranteed to be 0 and NULL may be passed as the offset parameter.

It furthermore states:

Since the hardware enforces an alignment requirement on texture base addresses, cudaBindTexture() returns in ∗offset a byte offset that must be applied to texture fetches in order to read from the desired memory. This off set must be divided by the texel size and passed to kernels that read from the texture so they can be applied to thetex1Dfetch() function.

Misleading, because the texel size is in bytes: If the channelDesc is single-channel float, then you have to divide by sizeof(float) and not 1. Suggested fix: obvious.

Finally, the most annoying bits: The API reference emphasises in detail the difference between the low-level C-style entry points and the high-level C++ style. The “low-level” example in the progguide (see pageref above) uses templates! nvcc under the hood is C++, Tim Murray said something along the lines elsewhere in these forums ("–host-compilation C does not enforce C"). So this works, but misleads. Also, the progguide example uses tex2D. Analogously using tex1D for linear memory wrecks havok! Please emphasise that tex1Dfetch() is the intrinsic to use. In summary: Please rewrite the texturing bit in the progguide to match the API guide and to be actually useful.

I won’t go on about the SDK which is “in line” as in “mixes stuff all over the place for simplicity” :)