Constant memory multi GPU Magnagement global constant memory multiGPU


i’m working in a multiGPU version of encryption algorithm. In this algorithm i use a device constant variable to allocate salt values. In the single GPU version i only use a varible, for example,

device constant d_salt[8];

for the multiGPU vrsion i use one varible for each card:

device constant d_salt0[8]; //Card 0
device consyant d_salt1[8]; //Card 1

Well, when try to copy from CPU to GPU, on the multiGPU version, the salt data, the execution show “segmentation fault” error on the cudaMemcpyToSymbol instruction. I copy the same salt data for both constant variable. On each copy the card selected is diferent.

Both variable are declaring at global scope on kernel functions file.

I don’t known what is the problem. Help?

Thank’s a lot.


It would be necessary to define what datatype it is:

constant float d_salt[8] or

constant int d_salt[8]

But I’m guessing you have a typo in above code.

You should be able to have just one d_salt variable without any issues in your multi-GPU solution. To my understanding each context will keep track of its own symbol.

Hi!, thanks for your reply.

Sorry, i had a typo writing the previous post, i forget the type of the variable :wallbash:

On my first version of the algorithm i only used one d_salt variable, global for all card/context, but i always obtained the same “segmentation fault” error. For that, i supossed that need one d_salt variable for card/context.

The instruction that raise the error is: cutilSafeCall(cudaMemcpyToSymbol(d_salt, salt, LEN_SALT * sizeof(unsigned char))); and the context is fixed before it call with cudaSetDevice(device) and cutStartThread(…). Salt is allocated on CPU memory and has the same size that d_salt.

I don’t understand what happen.



The existence of a segmentation fault suggests that there exists an issue with either the copy kind specified or with the source pointer. The following snippet should work, and you can go from there.

__device__ __constant__ int d_salt[8];

int main(int argc, char *argv[]) {

    int h_salt[8] = {1,2,3,4,5,6,7,8};


    cudaMemcpyToSymbol(d_salt, h_salt, 8*sizeof(int), cudaMemcpyHostToDevice);

    return 0;


The expression “d_salt global for all cards/context” alerts me to an issue that may need to be addressed. To amplify Jimmy’s remark,

with respect to issues involving multi-GPU programming, do note that device variables are implicitly per-device – you can think of it as (per-)device. That means that the in the following code, the addresses for d_salt may be different on the different devices (since they are different allocations). If the two devices support unified addressing (Fermi with 64-bit Linux, Mac, WinXP, or TCC), then they are guaranteed to be different addresses, and the following snippet will print two different addresses

__device__ __constant__ int d_salt[8];

int main(int argc, char *argv[]) {

    int devaddrs[2];


    cudaGetSymbolAddress(&devaddrs[0], d_salt);


    cudaGetSymbolAddress(&devaddrs[1], d_salt);

    printf("device 0: %p, device 1: %p\n", devaddrs[0], devaddrs[1]);

    return 0;