Compare two matrices...

timmy08 · October 25, 2009, 3:06pm

Hi,

I want to compare two matrices (elementwise) to check if they are identical. If two values are not identical, I want to signal this, e.g. set a flag to 1 or something like that.

Is it a good idea to use shared memory for the comparison? And how can I implement the “flag”?

I hope you can help me :rolleyes:

Ciao,

T.

cbuchner1 · October 25, 2009, 3:31pm

No speed benefit gained from shared memory. Each element is exactly read once.

I propose to return either the Mean Square Error or the maximum error as a success metric, not just a flag. It is then up to the caller to determine whether or not the computation was successful (i.e. within bounds) or not.

Sarnath · October 26, 2009, 8:33am

Shared memory should be used like a cache-- i.e. it benefits only if u repeatedly re-use…

As long as signalling, there is no way in CUDA that you could halt the kernel. AT most you can only return from current block.

The solution would be to:

Have a flag in global memory (inited to 0)
Each block (one thread representing the block) would check if the flag is set. If set, just return
If not set, go ahead with comparison. Set flag if un-equal (using the representative thread for the block)

Spawn more blocks and have less threads per block…

timmy08 · October 26, 2009, 10:51am

Hello,

thank you for your answers!

I have now a variable device int flag = 0; defined outside my host code. I can read and write it from host code, but my kernel (which gets the variable as parameter) does not change the value.

eyalhir74 · October 26, 2009, 12:21pm

You probably didnt allocate memory for it. you need to use cudaMemAlloc for it so that the kernel will be able to see and use.

Just declaring is not enough.

eyal

timmy08 · October 26, 2009, 12:47pm

Hi,

I declared the variable outside the host and kernel code like this

device int flag

and then I allocated within the host code

int initflag = 0;

cudaMalloc((void**)flag, sizeof(int));

cudaMemcpyToSymbol(“flag”, &initflag, sizeof(int), 0, cudaMemcpyHostToDevice);

is this correct?

Sarnath · October 27, 2009, 6:00am

No need for cudaMalloc. This code above is completeyl wrong…

__device__ int flag;

int initflag = 0;

cudaMemcpyToSymbol("flag", &initflag, sizeof(int), 0, cudaMemcpyHostToDevice);

The code above should be good enough to set from Host.

Kernels can directly access the “flag” and set it.

You need to do a “cudaMemcpyFromSymbol” again in the host to see the value set by the kernel, in case u want to…

Also the data-direction argument looks funny for “cudaMemcpyToSymbol” – The “ToSymbol” itself means Host To Device… I dont know how the RT interprets it…Anyway…

timmy08 · October 27, 2009, 12:59pm

No need for cudaMalloc. This code above is completeyl wrong…
__device__ int flag;

int initflag = 0;

cudaMemcpyToSymbol("flag", &initflag, sizeof(int), 0, cudaMemcpyHostToDevice);
The code above should be good enough to set from Host.

Kernels can directly access the “flag” and set it.

You need to do a “cudaMemcpyFromSymbol” again in the host to see the value set by the kernel, in case u want to…

Also the data-direction argument looks funny for “cudaMemcpyToSymbol” – The “ToSymbol” itself means Host To Device… I dont know how the RT interprets it…Anyway…

Hello,

thank you for this! I tried it, but it seems that my kernel cannot change/set the value of flag. I pass flag as an argument to the comparison kernel

gemini0x4d · October 27, 2009, 1:07pm

Hi,

funny, I’ve implemented the same routine, 'cause I wanted to check for errors within two matrices. The suggested way is how I did it too and it works perfectly.

Did you use cudaMemcpyFromSymbol to read out the flag in your host code?

[edit]
If you declared your flag variable as Sarnath showed you, it IS accessible by all kernels…
[/edit]

Best regards,

gemini

Sarnath · October 28, 2009, 5:15am

Thats the mistake… Kernels can directly acces the “flag” variable. You dont need to pass as argument. Thats the mistake you are doing.

All you have to do is to declare the “device int flag” ahead of the kernel in the same file (or the current compilation unit… )

If you want to pass the address, you need to first do “cudaGetSymbolAddress” to get the device address of “flag”… “&flag” will NOT help in this case.

Hope you are seeing the point now…

Its kind of fuzzy… but yeah. thats the way it is.

Topic		Replies	Views
Array Comparision CUDA Programming and Performance	4	4372	May 31, 2009
Take Garbage Value wrong output how to use shared memory in a program CUDA Programming and Performance	2	5039	December 23, 2009
Comparing memory blocks CUDA Programming and Performance	2	4969	June 27, 2007
Sharing a single counter (variable) across multiple thread(s) block(s) CUDA Programming and Performance	13	4645	December 27, 2017
Problem with shared memory CUDA Programming and Performance	6	1010	October 23, 2015
Some help needed with shared memory and program correctness matrix * vector operation CUDA Programming and Performance	1	1174	November 30, 2008
using shared memory CUDA Programming and Performance	6	3023	September 17, 2009
Matrix Multiplication: Shared vs Global Memory CUDA Programming and Performance	1	3742	June 27, 2011
Getting wrong output from CUDA kernel CUDA Programming and Performance	6	8393	April 15, 2011
Some confusion on using shared memory. CUDA Programming and Performance	26	9412	June 2, 2009

Compare two matrices...

Related topics