Hi all !
I’m new in the CUDA world and I’ve read some of the examples in the SDK. In the source codes I’ve read, I could’nt find any “const” keyword, although I’ve been thaught that (in C++) its role is quit important for the compiler to be able to optimize fully the code.
For example, the matrix multiplication code form the SDK
__global__ void
matrixMul( float* C, float* A, float* B, int wA, int wB)
{
// Block index
int bx = blockIdx.x;
int by = blockIdx.y;
// Thread index
int tx = threadIdx.x;
int ty = threadIdx.y;
//...
}
doesn’t have any “const” even if these four variables will not change during the execution of the kernel.
This leads me to three questions:
-
Has this keyword been forgotten here ?
-
As the answer to 1) is probably “no”, can I deduce that it is useless if one consider performances only ?
-
Is this true in any case ?
Thanks to everybody who could help me !
I don’t have a detailed answer, bu tI can tell you that I recall having observed smaller code size after compiling with consts, rather than not consts.
Smaller code-size implies (but doesn’t guarantee ofc) faster execution/better optimization.
Try it out on your own code, and see what size you get on the binary.
Only try this with a regular binary, without debugging, as I believe that debugging information prevents said optimizations.
I’ll go try on my own code too.
Ok, here’s what I did:
I ran a “lengthy” kernel run for a total of:
real 5m35.241s
user 5m29.800s
sys 0m10.800s
My binary (so-lib) was 238902 bytes in length.
I dropped all consts from my source files (sed -i ‘s/const/ /g’ *.cu) and then restored a few that where necessary for compiling.
Binary grew to 239056 bytes. (0.06%)
real 5m36.901s (336901ms / 335241ms = 0.49% increase)
user 5m30.970s (330970ms / 329800ms = 0.35% increase)
sys 0m11.380s (10800ms / 11380ms = 5.4% increase)
All in all, the execution wasn’t affected alot.
The increase in filesize despite smaller source code should indicate that const does help with optimization.
The time increase without consts was very small, but might get bigger in another program. Or with deeper understanding of the optimization process. Also note that I only tried once, so I can claim no statistical significance.
For optimization read: [url=“http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf”]http://developer.download.nvidia.com/compu...esGuide_2.3.pdf[/url]
What is important to me however, is that consts save a lot of developer time. Not only because it avoids human mistakes, but also because a global const is available both inside and outside of kernels. This saves me the pain of sending in a lot of consts.
YMMV. Good luck.
Thanks a lot for your detailed answer.
I will try out both options and decide then which one is faster.