[Solved] Texture access and inline CUDA ptx assembly in VS 2010

Hello everybody.

I’m currently working on the simple CUDA ray tracer. I decided to write some of the kernels in the inline CUDA ptx assembly to reduce the register overhead ( my 9600M has cc. 1.1 ). I came across the problem when trying to access “texture memory” using the “tex” instruction.

For brevity, let’s consider the following simple kernel converting array of chars to array of ints:

texture<char, 1, cudaReadModeElementType> chars;

__global__ void Foo(int *devIntArr) {
	
//char charVal = tex1Dfetch(chars, threadIdx.x);
//devIntArr[threadIdx.x] = charVal;
	
asm (
    ".reg .b32 r1;\n"
    ".reg .b32 r2;\n"
    ".reg .b32 r3;\n"
    ".reg .b32 r4;\n"
    ".reg .b32 r5;\n"
    "mov.b32 r1, %tid.x;\n"
    "tex.1d.v4.u32.s32 { r2, r3, r4, r5 }, [chars, { r1 }];\n"
    "shl.b32 r1, r1, 2;\n"
    "add.s32 %0, %0, r1;\n"
    "and.b32 r2, r2, 255;\n"
    "st.global.b32 [r1], r2;\n"
    :
    : "r"(devIntArr)
    );
}

cudaError_t FooWithCuda() {
...
}

The compiler says:

  • "State space mismatch between instruction and address in instruction 'tex'";
  • "Label expected for forward reference of 'chars'";
  • When I uncomment the commented lines and vice versa and look at the generated “ptx” code, the line concerning the “tex” instruction is exactly the same as mine. Before the code of the kernel I noticed the line:

    ".tex .u32 chars"
    

    , so it looks like the compiler adds this reference automatically when comes across

    "texture<..., ..., ...> ...
    

    in the source file.

    So, what is the reason of the above mentioned error and how to get around it?

    Thanks in advance.

    Your code seems to compile on sm_20+ architectures. It appears that module-scope symbols in PTX (like “chars”) are recognized by the sm_20+ compiler but not recognized by the older sm_1x compiler (as you noticed).

    I was going to suggest that you declare a PTX snippet with “.tex .u32 chars” at module scope but an “asm(”…“);” declaration at this level is ignored.

    Furthermore, casting a texture reference to a scalar integer type isn’t supported (it would be a nice hack).

    So… I’m not sure there is a workaround but targeting Fermi/Kepler devices (which use the newer compiler) is an option.

    Maybe someone else has a workaround?

    Thanks for Your answer. I tried everything… . I put the asm code with the line You’ve just suggested into my *.cu file outside of the Foo’s declaration (I mean, the global scope), but it didn’t fix the problem - there was another compiler error. I tried also to pass the texture reference to the register using AT&T asm syntax used by the NVIDIA compiler:

    asm (
        ...
        :
        : "r"(chars)
    );
    

    but as You pointed out, casting a texture reference to a scalar integer (in fact, you can hold only the 32-bit data in the 32-bit register) isn’t supported. I wonder if there is another kind of “switch” in the AT&T asm input section, I mean something like:

    asm (
        ...
        :
        : "?"(chars)
    );
    

    , which could facilitate passing the texture reference to the asm code.

    I also came up with the another idea… . What about including pure “.ptx" in the ".cu” file? Is it possible at all?

    You could include your own pure PTX file and load it via a Driver API command. That will be extremely painful fun and a learning experience.

    Using the fatbinary.exe utility you can construct a binary .fatbin that contains a mix of .cubin and .ptx files. It’s a nice way of packaging code.

    ( I’m unaware of any other inline parameter “constraints” than the ones that are listed – I’ve poked around before looking for undocumented constraints but didn’t find anything. )