dgm4d
March 7, 2011, 1:17am
1
For CUDA 4.0, what are the inline asm register constraint strings for 8-bit and 16-bit types? In previous versions, you would use “r” for any operands 32-bits or smaller. For example, you used to be able to:
// Pointer operand constraints for inlining PTX assembly
#if defined(_WIN64) || defined(__LP64__)
// 64-bit pointer operand constraint for inlined asm
#define _ASM_PTR_ "l"
#else
// 32-bit pointer operand constraint for inlined asm
#define _ASM_PTR_ "r"
#endif
__device__ __forceinline__ void Ld(unsigned short &val, unsigned short* d_ptr, size_t offset) {
asm("ld.global.cg.u16 %0, [%1];" : "=r"(val) : _ASM_PTR_(d_ptr + offset));
}
But now cudafe complains about the “r” modifier for the unsigned short “val”:
error: asm operand type size(2) does not match type/size implied by constraint 'r'
Thanks,
dgm4d
esler
March 7, 2011, 4:32am
2
For CUDA 4.0, what are the inline asm register constraint strings for 8-bit and 16-bit types? In previous versions, you would use “r” for any operands 32-bits or smaller. For example, you used to be able to:
// Pointer operand constraints for inlining PTX assembly
#if defined(_WIN64) || defined(__LP64__)
// 64-bit pointer operand constraint for inlined asm
#define _ASM_PTR_ "l"
#else
// 32-bit pointer operand constraint for inlined asm
#define _ASM_PTR_ "r"
#endif
__device__ __forceinline__ void Ld(unsigned short &val, unsigned short* d_ptr, size_t offset) {
asm("ld.global.cg.u16 %0, [%1];" : "=r"(val) : _ASM_PTR_(d_ptr + offset));
}
But now cudafe complains about the “r” modifier for the unsigned short “val”:
error: asm operand type size(2) does not match type/size implied by constraint 'r'
Thanks,
dgm4d
There is a new PDF in the doc directory in inline assembly. According to it, .u16 registers use the constraint “h”.
dgm4d
March 7, 2011, 5:49am
3
Thanks! “h” works perfectly for 16-bit shorts. Hoping that there was just a document omission for 8-bit chars, I tried all 42 lower/upper single-character register constraints, but sadly there doesn’t seem to be anything for the 8-bit types.
Casting seems to work for the time being (since everything’s 32-bit registers anyway), even if it’s a little brittle-looking:
__device__ __forceinline__ void Ld(unsigned char &val, unsigned char* d_ptr, size_t offset) {
asm("ld.global.cg.u8 %0, [%1];" : "=r"(*reinterpret_cast<unsigned int*>(&val)) : _ASM_PTR_(d_ptr + offset));
}
The code seems to fail on Warp Misaligned Address error. Weird, because u8 is supposed to have no alignment constraints.
Any ideas?
Thanks! “h” works perfectly for 16-bit shorts. Hoping that there was just a document omission for 8-bit chars, I tried all 42 lower/upper single-character register constraints, but sadly there doesn’t seem to be anything for the 8-bit types.
Casting seems to work for the time being (since everything’s 32-bit registers anyway), even if it’s a little brittle-looking:
__device__ __forceinline__ void Ld(unsigned char &val, unsigned char* d_ptr, size_t offset) {
asm("ld.global.cg.u8 %0, [%1];" : "=r"(*reinterpret_cast<unsigned int*>(&val)) : _ASM_PTR_(d_ptr + offset));
}