inline assembly


CUDA 1.1 (i don’t know whether 1.0 could do that) supports inlining assembler instructions e.g.


unfortunately referencing high-level language variables does not work:

 float t = 1;

  float s=2;

  float result;

  asm("add.f32 result, t, s;");

the code above leads to PTXAS errors, whereas the following lines are translated without problems.

 float t = 1;

  float s=2;

  float result;

  asm(".reg .f32 t, s, result; add.f32 result, t, s;");

so currently one can either:

  • stick to HLL

  • code everything in one asm-string

  • or declare variables twice (resulting in really nice code)

is that inline-assembler “feature” likely to be improved in near future? (i mean TRUE inline assembly without the above mentioned drawbacks)

has anyone found an other, hopefully better way to inline assembly?