hi, there
My code uses constant memory which works well on single gpu. Since I am greedy, I want to use multiple gpus to run the same job. The parallelization idea is quite straightforward. I use pthread.h to create 4 threads on CPU (my machine has 8 cores). Then, each thread calls a gpu device and writes part of results into a big array. It works fine for a small testing job without using constant memory. But for my project, I don’t know where/how to declare the constant memory.
For the single gpu code. I declared it at the first beginning and call the copy as following:
[codebox]device constant float c_A_n[CHUNK_SIZE_fwd];
…
CUDA_SAFE_CALL(cudaMemcpyToSymbol(c_A_n,&trueA[(nyx+ny)*CHUNK_SIZE_fwd],sizeof(float)*CHUNK_SIZE_
fwd,0,cudaMemcpyHostToDevice));[/codebox]
Apparently, I can not do this anymore, since the compiler may get confused about which device I mean. So I decided to put the same lines to the pthread_functions and expected each CPU thread to declare a constant memory on each corresponding GPU. I can pass the compiling but I got the error when I run it:
multifwd_func.cu:95: error: cannot convert ‘float’ to ‘const char’ for argument ‘1’ to ‘cudaError_t cudaMemcpyToSymbol(const char*, const void*, size_t, size_t, cudaMemcpyKind)’
I did not have the error for the same code running on single gpu. I don’t know what to do. Is there any way to specify the device when I declare the constant memory? or I can just locate the same delcaration at a smarter place?
Thanks.
-Kun