i’m working in a multiGPU version of encryption algorithm. In this algorithm i use a device constant variable to allocate salt values. In the single GPU version i only use a varible, for example,
deviceconstant d_salt[8];
for the multiGPU vrsion i use one varible for each card:
Well, when try to copy from CPU to GPU, on the multiGPU version, the salt data, the execution show “segmentation fault” error on the cudaMemcpyToSymbol instruction. I copy the same salt data for both constant variable. On each copy the card selected is diferent.
Both variable are declaring at global scope on kernel functions file.
It would be necessary to define what datatype it is:
constant float d_salt[8] or
constant int d_salt[8]
But I’m guessing you have a typo in above code.
You should be able to have just one d_salt variable without any issues in your multi-GPU solution. To my understanding each context will keep track of its own symbol.
Sorry, i had a typo writing the previous post, i forget the type of the variable External Image
On my first version of the algorithm i only used one d_salt variable, global for all card/context, but i always obtained the same “segmentation fault” error. For that, i supossed that need one d_salt variable for card/context.
The instruction that raise the error is: cutilSafeCall(cudaMemcpyToSymbol(d_salt, salt, LEN_SALT * sizeof(unsigned char))); and the context is fixed before it call with cudaSetDevice(device) and cutStartThread(…). Salt is allocated on CPU memory and has the same size that d_salt.
The existence of a segmentation fault suggests that there exists an issue with either the copy kind specified or with the source pointer. The following snippet should work, and you can go from there.
__device__ __constant__ int d_salt[8];
int main(int argc, char *argv[]) {
int h_salt[8] = {1,2,3,4,5,6,7,8};
cudaSetDevice(0);
cudaMemcpyToSymbol(d_salt, h_salt, 8*sizeof(int), cudaMemcpyHostToDevice);
return 0;
}
The expression “d_salt global for all cards/context” alerts me to an issue that may need to be addressed. To amplify Jimmy’s remark,
with respect to issues involving multi-GPU programming, do note that device variables are implicitly per-device – you can think of it as (per-)device. That means that the in the following code, the addresses for d_salt may be different on the different devices (since they are different allocations). If the two devices support unified addressing (Fermi with 64-bit Linux, Mac, WinXP, or TCC), then they are guaranteed to be different addresses, and the following snippet will print two different addresses
__device__ __constant__ int d_salt[8];
int main(int argc, char *argv[]) {
int devaddrs[2];
cudaSetDevice(0);
cudaGetSymbolAddress(&devaddrs[0], d_salt);
cudaSetDevice(1);
cudaGetSymbolAddress(&devaddrs[1], d_salt);
printf("device 0: %p, device 1: %p\n", devaddrs[0], devaddrs[1]);
return 0;
}