Compare two matrices...


I want to compare two matrices (elementwise) to check if they are identical. If two values are not identical, I want to signal this, e.g. set a flag to 1 or something like that.

Is it a good idea to use shared memory for the comparison? And how can I implement the “flag”?

I hope you can help me :rolleyes:



No speed benefit gained from shared memory. Each element is exactly read once.

I propose to return either the Mean Square Error or the maximum error as a success metric, not just a flag. It is then up to the caller to determine whether or not the computation was successful (i.e. within bounds) or not.

Shared memory should be used like a cache-- i.e. it benefits only if u repeatedly re-use…

As long as signalling, there is no way in CUDA that you could halt the kernel. AT most you can only return from current block.

The solution would be to:

  1. Have a flag in global memory (inited to 0)

  2. Each block (one thread representing the block) would check if the flag is set. If set, just return

  3. If not set, go ahead with comparison. Set flag if un-equal (using the representative thread for the block)

Spawn more blocks and have less threads per block…


thank you for your answers!

I have now a variable device int flag = 0; defined outside my host code. I can read and write it from host code, but my kernel (which gets the variable as parameter) does not change the value.

You probably didnt allocate memory for it. you need to use cudaMemAlloc for it so that the kernel will be able to see and use.

Just declaring is not enough.



I declared the variable outside the host and kernel code like this

device int flag

and then I allocated within the host code

int initflag = 0;

cudaMalloc((void**)flag, sizeof(int));

cudaMemcpyToSymbol(“flag”, &initflag, sizeof(int), 0, cudaMemcpyHostToDevice);

is this correct?

No need for cudaMalloc. This code above is completeyl wrong…

__device__ int flag;

int initflag = 0;

cudaMemcpyToSymbol("flag", &initflag, sizeof(int), 0, cudaMemcpyHostToDevice);

The code above should be good enough to set from Host.

Kernels can directly access the “flag” and set it.

You need to do a “cudaMemcpyFromSymbol” again in the host to see the value set by the kernel, in case u want to…

Also the data-direction argument looks funny for “cudaMemcpyToSymbol” – The “ToSymbol” itself means Host To Device… I dont know how the RT interprets it…Anyway…


thank you for this! I tried it, but it seems that my kernel cannot change/set the value of flag. I pass flag as an argument to the comparison kernel


funny, I’ve implemented the same routine, 'cause I wanted to check for errors within two matrices. The suggested way is how I did it too and it works perfectly.

Did you use cudaMemcpyFromSymbol to read out the flag in your host code?

If you declared your flag variable as Sarnath showed you, it IS accessible by all kernels…

Best regards,


Thats the mistake… Kernels can directly acces the “flag” variable. You dont need to pass as argument. Thats the mistake you are doing.

All you have to do is to declare the “device int flag” ahead of the kernel in the same file (or the current compilation unit… )

If you want to pass the address, you need to first do “cudaGetSymbolAddress” to get the device address of “flag”… “&flag” will NOT help in this case.

Hope you are seeing the point now…

Its kind of fuzzy… but yeah. thats the way it is.