are “static const var” insider kernel and outside kernel Same in the hardware?
I’m not sure what “in the hardware” means.
The two realizations would have different scope. I guess “outside the kernel” would be at global scope:
const int val = 3;
__global__ void k(...) {...};
Inside the kernel would be at function scope:
__global__ void k(...) {const int val = 3; ...};
In the global scope example, and assuming your variable definition met certain requirements, you could use that either in host code, or device code, or both. How exactly the compiler will manifest that in each case is not something that can be answered without more of a test case.
In the function scope example, the variable definition would have fewer requirements to be usable in device code, but would not be usable in host code, or outside the function scope it was defined in.
AFAIK, the use of static
doesn’t change the above description much. static
has a defined meaning for C++, I won’t bother repeating it here. static
does not imply global scope automatically.
AFAIK, the compiler has several different methods to handle const
-qualified variables. Any answer could change from one CUDA version to the next. Therefore the best answer to such a question “how does the compiler handle a const
-qualified variable”) is to study the SASS code from representative cases. Here is an example:
# cat t8.cu
#include <iostream>
#include <cstdio>
static const int var = 3;
__global__ void k(){
// static const int var = 3;
printf("%d\n", var);
}
int main(){
k<<<1,1>>>();
cudaDeviceSynchronize();
}
# nvcc -o t8 t8.cu
# ./t8
3
# cuobjdump -sass t8
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit
code for sm_52
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit
code for sm_52
Function : _Z1kv
.headerflags @"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM52 EF_CUDA_VIRTUAL_SM(EF_CUDA_SM52)"
/* 0x001fc800fe2007f6 */
/*0008*/ MOV R1, c[0x0][0x20] ; /* 0x4c98078000870001 */
/*0010*/ IADD32I R1, R1, -0x8 ; /* 0x1c0fffffff870101 */
/*0018*/ MOV32I R0, 0x3 ; /* 0x010000000037f000 */
/* 0x001fc4001e2007f0 */
/*0028*/ MOV32I R4, 0x0 ; /* 0x010000000007f004 */
/*0030*/ STL [R1], R0 ; /* 0xef54000000070100 */
/*0038*/ MOV32I R5, 0x0 ; /* 0x010000000007f005 */
/* 0x003ff400fcc007f1 */
/*0048*/ MOV R7, RZ ; /* 0x5c9807800ff70007 */
/*0050*/ LOP.OR R6, R1, c[0x0][0x4] ; /* 0x4c47020000170106 */
/*0058*/ JCAL 0x0 ; /* 0xe220000000000040 */
/* 0x001ffc00fd4007ef */
/*0068*/ NOP ; /* 0x50b0000000070f00 */
/*0070*/ NOP ; /* 0x50b0000000070f00 */
/*0078*/ EXIT ; /* 0xe30000000007000f */
/* 0x001f8000fc0007ff */
/*0088*/ BRA 0x80 ; /* 0xe2400fffff07000f */
/*0090*/ NOP; /* 0x50b0000000070f00 */
/*0098*/ NOP; /* 0x50b0000000070f00 */
/* 0x001f8000fc0007e0 */
/*00a8*/ NOP; /* 0x50b0000000070f00 */
/*00b0*/ NOP; /* 0x50b0000000070f00 */
/*00b8*/ NOP; /* 0x50b0000000070f00 */
..........
Fatbin ptx code:
================
arch = sm_52
code version = [8,2]
host = linux
compile_size = 64bit
compressed
#
In this particular case, we see that the compiler has hard-coded the constant into the instruction stream:
/*0018*/ MOV32I R0, 0x3 ; /* 0x010000000037f000 */
Thanks, so static is not same as C++…
So #define val 3
is a little bit more effcient?
I think static is the same. I’m not sure how you came to that conclusion.
I doubt it. (I don’t think you can get more efficient than having that 3 hardcoded in the instruction stream the way I indicated.) However the possible questions here are endless and I probably won’t be able to respond to a stream of them. I believe I have given you enough information to at least start the process of learning how to answer them yourself if you wish.
Also, with a bit of searching effort, you may be able to find other, similar questions, such as this one. Good luck!
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.