CUDA 1.1 (i don’t know whether 1.0 could do that) supports inlining assembler instructions e.g.
unfortunately referencing high-level language variables does not work:
float t = 1; float s=2; float result; asm("add.f32 result, t, s;");
the code above leads to PTXAS errors, whereas the following lines are translated without problems.
float t = 1; float s=2; float result; asm(".reg .f32 t, s, result; add.f32 result, t, s;");
so currently one can either:
stick to HLL
code everything in one asm-string
or declare variables twice (resulting in really nice code)
is that inline-assembler “feature” likely to be improved in near future? (i mean TRUE inline assembly without the above mentioned drawbacks)
has anyone found an other, hopefully better way to inline assembly?