__constant__ memory which is device-side only (avoiding cudaMemcpyToSymbol)

I have some numerical constants (constant over the life of the application, the same values called by each thread every time a kernel is called) which I would like to reside in the device-side constant cache. Namely, I do not need them on the host, and would like to avoid using cudaMemcpyToSymbol. (Note that some of these constants are arrays which functions need the pointer to, so #define’ing is no good.)

If I set my constants as such:

__device__ __constant__ constNumber[4] = {1,2,3,4};

something goes wrong - the values, as implemented in the kernel, seem to be zero. So I’m guessing the above is the wrong way to go about this.

If there is a way to use constant and managed together, that would be appreciated as well. TIA!

what you have posted wouldn’t compile, there is no variable type (int, float, etc.) specification:

However if we fix that, what you have posted seems to work for me:

$ cat t152.cu
#include <stdio.h>

__device__ __constant__  int constNumber[4] = {1,2,3,4};

__global__ void t(){

  printf("%d\n", constNumber[3]);
}

int main(){

  t<<<1,1>>>();
  cudaDeviceSynchronize();
}
$ nvcc -arch=sm_61 -o t152 t152.cu
$ cuda-memcheck ./t152
========= CUDA-MEMCHECK
4
========= ERROR SUMMARY: 0 errors
$

This is static initialization of a device variable, and subject to some limitations, I believe it should work. Initialization of an array (or maybe it was array-of-struct) this way might be problematic, I think I’ve tried that before and you effectively need a C++11 style initialization, IIRC

Right about the type - I wasn’t copying from my code, of course.

I get illegal memory access errors when trying to refer to constNumber from within a kernel. I forgot to mention in the OP that I was wanting to avoid passing pointers to these constants to each kernel, and then passing to each function called by the kernel, etc. This is not necessary when declaring constNumber as device managed - the device functions/kernels are aware of those addresses already. So I was hoping to avoid breaking this while still holding the values in the constant cache.

Thank you!

Not sure what you’re talking about. The example I gave doesn’t involve passing pointers to the constants to the kernel, nor does it throw any illegal memory access errors.

I wasn’t using your code; I was doing as I said, which was using these values in device and global functions. This is where I get the memory error.

Maybe I am misunderstanding the use case, but statically initialized constant data has worked in CUDA from the very beginning. Here are a couple of relevant snippets from the CUDA 6.5 math library (inside the file math_functions_dbl_ptx3.h):

/* 1152 bits of 2/PI for Payne-Hanek style argument reduction. */
static __constant__ unsigned long long int __cudart_i2opi_d [] = {
  0x6bfb5fb11f8d5d08ULL,
  0x3d0739f78a5292eaULL,
  0x7527bac7ebe5f17bULL,
  0x4f463f669e5fea2dULL,
  0x6d367ecf27cb09b7ULL,
  0xef2f118b5a0a6d1fULL,
  0x1ff897ffde05980fULL,
  0x9c845f8bbdf9283bULL,
  0x3991d639835339f4ULL,
  0xe99c7026b45f7e41ULL,
  0xe88235f52ebb4484ULL,
  0xfe1deb1cb129a73eULL,
  0x06492eea09d1921cULL,
  0xb7246e3a424dd2e0ULL,
  0xfe5163abdebbc561ULL,
  0xdb6295993c439041ULL,
  0xfc2757d1f534ddc0ULL,
  0xa2f9836e4e441529ULL,
};

// [...]

#pragma unroll 1
  for (q = (idx-1); q < min(18,idx+3); q++) {
    p = __internal_umad64wide (__cudart_i2opi_d[q], ia, p.y);
    result[q-(idx-1)] = p.x;
  }

I apologize - no memory access errors. (I’m on shoddy wifi so files may not have transferred as I thought.)

But the original issue posed in the OP still stands. To be concrete:

I declare constant values via

__device__ __constant__ int constNumbers[4] = {1,2,3,4};

which I then want to use directly with the kernel

__global__ void kernel()
{
    // do stuff with constNumbers such as
    int x  = constNumbers[1];
    printf("%d",x);
}

But the output of the kernel is as if constNumbers[1] = 0.

In #2 txbob already showed that the stripped-down version of your code works just fine. This would seem to suggest you have a bug somewhere in your code.

BTW, I am not sure what the addition of device to constant is supposed to do, because constant is on the device already.

You both are correct; I apologize for misreading. I am not sure how it can be a bug in my code, for all I do is replace a managed keyword with constant which causes the issue. I will look into it.

And the device was there because the variables were originally device managed – good to know it’s not needed.

I resolved my issue; I forgot that I call a host device function exactly once from the host which uses those values. I suppose this was worth it for a 7% speed increase…

The compiler tells me that constant memory can’t be managed - is there any way to have the same variable name for both constant and host-side memory?

Thank you again for your help!