I want to compare two matrices (elementwise) to check if they are identical. If two values are not identical, I want to signal this, e.g. set a flag to 1 or something like that.
Is it a good idea to use shared memory for the comparison? And how can I implement the “flag”?
No speed benefit gained from shared memory. Each element is exactly read once.
I propose to return either the Mean Square Error or the maximum error as a success metric, not just a flag. It is then up to the caller to determine whether or not the computation was successful (i.e. within bounds) or not.
I have now a variable device int flag = 0; defined outside my host code. I can read and write it from host code, but my kernel (which gets the variable as parameter) does not change the value.
No need for cudaMalloc. This code above is completeyl wrong…
__device__ int flag;
int initflag = 0;
cudaMemcpyToSymbol("flag", &initflag, sizeof(int), 0, cudaMemcpyHostToDevice);
The code above should be good enough to set from Host.
Kernels can directly access the “flag” and set it.
You need to do a “cudaMemcpyFromSymbol” again in the host to see the value set by the kernel, in case u want to…
Also the data-direction argument looks funny for “cudaMemcpyToSymbol” – The “ToSymbol” itself means Host To Device… I dont know how the RT interprets it…Anyway…
funny, I’ve implemented the same routine, 'cause I wanted to check for errors within two matrices. The suggested way is how I did it too and it works perfectly.
Did you use cudaMemcpyFromSymbol to read out the flag in your host code?
[edit]
If you declared your flag variable as Sarnath showed you, it IS accessible by all kernels…
[/edit]