Encounter this compiler (nvcc v6.0/v6.5) problem with compute < 2. No problem with compute >= 2.
It seems only happen in ISTORE(indirect store?), for both shared memory and device memory.
Google open64 compiler “LDA/ILOAD/ISTORE Folding”, there are two options:
fold_lda_iload: Enables LDA-ILOAD/ISTORE coderep folding phase
simp_iload: Enable simplification of ILOAD-LDA to LDID
It seems nvcc uses fold_lda_iload. Is it possible to use simp_iload as a compiler option to workaround this problem?
Can you provide a simple test case (code) and compile command which generates the issue?
I guess I should also mention that the Open64 compiler is only used currently to generate code for pre-cc2.0 devices as you’re pointing out. Support for pre-cc2.0 devices is deprecated in CUDA 6.5 and will probably disappear in some future release. Therefore it’s unlikely to get any new features added to the Open64 component of the nvcc compiler-driver.
Thanks for the reply. Sorry I can not provide a simple test case. However, I workaround it and make the code also running on 1.x device finally.
I guess the cause of the problem is pointer support for the 1.x device, either due to limited compiler support or hardware limitation. So base on the working version(cc2.0 and above), I rewrite another 1.x version that no indirect access to constant memory. And the compiler message is gone and the problem is fixed.