let me introduce my experimental nvopencc compiler with addc intrinsics support ;)
ATTENTION: this is for 32-bit Linux users !!
New features are summarized as follows:
- all 32-bit integer add instructions (signed/unsigned) are replaced by “carry-out” versions, i.e. add.u32/s32 are replaced by add.cc.u32/s32
(otherwise there is no way to use carry-out)
- added two new builtins: __addc( a, b ) and __uaddc( a, b ) for signed and unsigned additions-with-carry respectively - they are mapped directly to addc.cc.u32/s32 instructions
I also wanted to add parallel reductions but there are some subtleties involved in managing global/shared memory pointers which I do not quite
understand, so maybe this will be added later
Although I tested this compiler with my kernels and it works well, please beware that this features are still fully experimental,
so if you want to try it out, do it on your own risk !
There is also one major drawback: for some reason open64
trunk’s version does not expand floating-point divisions, so attempting to use floating-point divisions would trigger an assertion,
something like: “Floating-point division is not yet implemented…”
On the other hand, GPU does not have native division, so it was implemented as a slow local subroutine…
Anyway if you want to try this out, installation is very simple: unpack NVOPENCC archive,
copy ‘be’ (compiler back-end), ‘gfec’ (gcc front-end) and ‘inline’ into /path/to/cuda/open64/lib,
‘nvopencc’ to /path/to/cuda/open64/bin, include ‘ext_intrinsics.h’ from your code or some library file.
And the last: add ‘/path/to/cuda/open64/lib’ to your PATH variable (I didn’t figure out completely how nvcc searches for different compilation
phases, so this is a required workaround). Detailed instructions can also be found in the archive.
Suggestions/comments are welcomed!