i got a quick question, that some could gimme an answer in no time.
problem definition:
i broke my code up into source and header (.cuh .cu) files. before it used to be a big chunk of code in a single file. (produced correct results - by checking with CPU computation results)
linking is fine - ie evokes the kernels as needed and the preprocessor wont produce any warnings or errors.
now the issue is with the constant memory. previously when my const 2d array was in global scope and i used cudaMemcpyToSymbol(d_matrix, &h_matrix, sizeof(h_mat));
it copied my 2d const array to gpu and used it correctly pulling out the right values when needed. now after splitting up my code and defining the constant d_matrix in the header file, it seems that it produces incorrect results by simply not copying it into the GPUs memory.
i’m struggling to find out how to deal with this. how should i define my constant structures in a header file so that they occur in the same scope so that calling cudaMemcpyToSymbol would actually copy the 2d matrix into constant memory…
if anyone knows the solution with constant memory objects in GPU’s memory, please let me know how can i resolve this
thanks a bunch beforehand!!
[font=“Courier New”]constant[/font] memory always has file scope, because CUDA has no linker on the device side. So your only option is to [font=“Courier New”]#include[/font] all code into one compilation unit.
It should work to have the constant memory definition in either .cu or .cuh .
The real problem is usually that ( from my experience ) you can only have one definition per compilation unit and you can’t pass a reference from one compilation unit to the other. Hence you need one definition in file scope as you mentioned.
Why did you need to move it to the .cuh in the first place? Are you externaly trying to access and update the memory ? Because the 2 different memory spaces will each have a different symbol for the memory.
Here is an example of WHAT WILL NOT WORK ( in my experience ) :
is there a work around this issue?
like if i throw my constant struct into file scope - which will certainly resolve the issue of copying onto it…
but my source of kernels uses that constant structure constantly (and well i need the cached features of it) External Image i.e. without its declaration in the header file - it throws me bunch of errors while compiling the GPU code.
otherwise if i define it several times (in my CUDA code and in my CPU code) - the preprocessor tells me that it’s been already defined.
i’m left with only one lame solution of throwing one kernel (that uses that constant memory) into my source file - which really beats the purpose of splitting up code in the first place.
any ideas would be appreciated.
i figured out the problem!
i’ve just used a local kernel variable that sits inside the kernel…
ahh, slow brain External Image
thank you people for you’re detailed explaination