I have some numerical constants (constant over the life of the application, the same values called by each thread every time a kernel is called) which I would like to reside in the device-side constant cache. Namely, I do not need them on the host, and would like to avoid using cudaMemcpyToSymbol. (Note that some of these constants are arrays which functions need the pointer to, so #define’ing is no good.)
This is static initialization of a device variable, and subject to some limitations, I believe it should work. Initialization of an array (or maybe it was array-of-struct) this way might be problematic, I think I’ve tried that before and you effectively need a C++11 style initialization, IIRC
Right about the type - I wasn’t copying from my code, of course.
I get illegal memory access errors when trying to refer to constNumber from within a kernel. I forgot to mention in the OP that I was wanting to avoid passing pointers to these constants to each kernel, and then passing to each function called by the kernel, etc. This is not necessary when declaring constNumber as devicemanaged - the device functions/kernels are aware of those addresses already. So I was hoping to avoid breaking this while still holding the values in the constant cache.
Not sure what you’re talking about. The example I gave doesn’t involve passing pointers to the constants to the kernel, nor does it throw any illegal memory access errors.
Maybe I am misunderstanding the use case, but statically initialized constant data has worked in CUDA from the very beginning. Here are a couple of relevant snippets from the CUDA 6.5 math library (inside the file math_functions_dbl_ptx3.h):
/* 1152 bits of 2/PI for Payne-Hanek style argument reduction. */
static __constant__ unsigned long long int __cudart_i2opi_d [] = {
0x6bfb5fb11f8d5d08ULL,
0x3d0739f78a5292eaULL,
0x7527bac7ebe5f17bULL,
0x4f463f669e5fea2dULL,
0x6d367ecf27cb09b7ULL,
0xef2f118b5a0a6d1fULL,
0x1ff897ffde05980fULL,
0x9c845f8bbdf9283bULL,
0x3991d639835339f4ULL,
0xe99c7026b45f7e41ULL,
0xe88235f52ebb4484ULL,
0xfe1deb1cb129a73eULL,
0x06492eea09d1921cULL,
0xb7246e3a424dd2e0ULL,
0xfe5163abdebbc561ULL,
0xdb6295993c439041ULL,
0xfc2757d1f534ddc0ULL,
0xa2f9836e4e441529ULL,
};
// [...]
#pragma unroll 1
for (q = (idx-1); q < min(18,idx+3); q++) {
p = __internal_umad64wide (__cudart_i2opi_d[q], ia, p.y);
result[q-(idx-1)] = p.x;
}
In #2 txbob already showed that the stripped-down version of your code works just fine. This would seem to suggest you have a bug somewhere in your code.
BTW, I am not sure what the addition of device to constant is supposed to do, because constant is on the device already.
You both are correct; I apologize for misreading. I am not sure how it can be a bug in my code, for all I do is replace a managed keyword with constant which causes the issue. I will look into it.
And the device was there because the variables were originally devicemanaged – good to know it’s not needed.
I resolved my issue; I forgot that I call a hostdevice function exactly once from the host which uses those values. I suppose this was worth it for a 7% speed increase…
The compiler tells me that constant memory can’t be managed - is there any way to have the same variable name for both constant and host-side memory?