[Solved] Texture access and inline CUDA ptx assembly in VS 2010

LowLevelKB · September 7, 2013, 10:24pm

Hello everybody.

I’m currently working on the simple CUDA ray tracer. I decided to write some of the kernels in the inline CUDA ptx assembly to reduce the register overhead ( my 9600M has cc. 1.1 ). I came across the problem when trying to access “texture memory” using the “tex” instruction.

For brevity, let’s consider the following simple kernel converting array of chars to array of ints:

texture<char, 1, cudaReadModeElementType> chars;

__global__ void Foo(int *devIntArr) {
	
//char charVal = tex1Dfetch(chars, threadIdx.x);
//devIntArr[threadIdx.x] = charVal;
	
asm (
    ".reg .b32 r1;\n"
    ".reg .b32 r2;\n"
    ".reg .b32 r3;\n"
    ".reg .b32 r4;\n"
    ".reg .b32 r5;\n"
    "mov.b32 r1, %tid.x;\n"
    "tex.1d.v4.u32.s32 { r2, r3, r4, r5 }, [chars, { r1 }];\n"
    "shl.b32 r1, r1, 2;\n"
    "add.s32 %0, %0, r1;\n"
    "and.b32 r2, r2, 255;\n"
    "st.global.b32 [r1], r2;\n"
    :
    : "r"(devIntArr)
    );
}

cudaError_t FooWithCuda() {
...
}

The compiler says:

"State space mismatch between instruction and address in instruction 'tex'";

"Label expected for forward reference of 'chars'";

When I uncomment the commented lines and vice versa and look at the generated “ptx” code, the line concerning the “tex” instruction is exactly the same as mine. Before the code of the kernel I noticed the line:

".tex .u32 chars"

, so it looks like the compiler adds this reference automatically when comes across

"texture<..., ..., ...> ...

in the source file.

So, what is the reason of the above mentioned error and how to get around it?

Thanks in advance.

allanmac · September 7, 2013, 11:47pm

Your code seems to compile on sm_20+ architectures. It appears that module-scope symbols in PTX (like “chars”) are recognized by the sm_20+ compiler but not recognized by the older sm_1x compiler (as you noticed).

I was going to suggest that you declare a PTX snippet with “.tex .u32 chars” at module scope but an “asm(”…“);” declaration at this level is ignored.

Furthermore, casting a texture reference to a scalar integer type isn’t supported (it would be a nice hack).

So… I’m not sure there is a workaround but targeting Fermi/Kepler devices (which use the newer compiler) is an option.

Maybe someone else has a workaround?

LowLevelKB · September 8, 2013, 12:49am

Thanks for Your answer. I tried everything… . I put the asm code with the line You’ve just suggested into my *.cu file outside of the Foo’s declaration (I mean, the global scope), but it didn’t fix the problem - there was another compiler error. I tried also to pass the texture reference to the register using AT&T asm syntax used by the NVIDIA compiler:

asm (
    ...
    :
    : "r"(chars)
);

but as You pointed out, casting a texture reference to a scalar integer (in fact, you can hold only the 32-bit data in the 32-bit register) isn’t supported. I wonder if there is another kind of “switch” in the AT&T asm input section, I mean something like:

asm (
    ...
    :
    : "?"(chars)
);

, which could facilitate passing the texture reference to the asm code.

I also came up with the another idea… . What about including pure “.ptx" in the ".cu” file? Is it possible at all?

allanmac · September 8, 2013, 1:18am

You could include your own pure PTX file and load it via a Driver API command. That will be extremely painful fun and a learning experience.

Using the fatbinary.exe utility you can construct a binary .fatbin that contains a mix of .cubin and .ptx files. It’s a nice way of packaging code.

( I’m unaware of any other inline parameter “constraints” than the ones that are listed – I’ve poked around before looking for undocumented constraints but didn’t find anything. )

Topic		Replies	Views
Texture objects fetching with unknown texture address CUDA Programming and Performance	0	780	February 24, 2015
PTXAS died with status 0xC0000005 (ACCES VIOLATION Compiler problem? CUDA Programming and Performance	3	4465	February 14, 2008
Simple texture problem Code will not compile. CUDA Programming and Performance	8	4229	February 4, 2010
Problem compiling Unknown symbol from compiler in CUDA CUDA Programming and Performance	2	3124	October 29, 2007
ptxas crash with testcase CUDA Programming and Performance	5	9083	May 29, 2008
Issue using inline PTX functions, with address operands, in CUDA application - Any help much appreciated! CUDA Programming and Performance	8	1181	April 7, 2018
Ptx assembly error while using surface CUDA Programming and Performance	1	842	December 16, 2013
ptxas compiles my program wrong CUDA 4.0RC2 CUDA Programming and Performance	2	4522	May 8, 2011
fetch texture memory error! CUDA Programming and Performance	1	597	April 3, 2013
CUDA texturing bug (or not all cases covered in the programming guide) CUDA Programming and Performance	1	5216	July 20, 2009

[Solved] Texture access and inline CUDA ptx assembly in VS 2010

Related topics