Constant Memory - When are variables cleaned? Multiple kernels for same constant...

Here is the situation -

I have two kernels, K_Compute and K_Display which I call in that sequence. I also have a structure that I declare on the constant memory (constant) and copy data into using the cudaMemcpyToSymbol() function. I want to use the same structure in the K_Display kernel too.

When I try to, the compiler does not have a problem, but the program crashes at the instruction I try to access it.

To check what I am doing, I tried a cudaMemcpyFromSymbol() AFTER the K_Display. Part of the data that is returned is garbage, which means that the structure memory is overwritten. Is there a way I can prevent this?

Main reasons I do not want to copy the data again:

*. Increased memory transfer activity - one of the slowest component of the solution

*. Change in the OO design we currently have - i.e will have to let the class that calls the 2nd kernel have access to the data too.

If I remove the instructions that access the constant memory data in the display kernel, the program runs fine.

Heres a succinct pseudo-code model of what I do -

Kernel side - 

__constant__ data;

__global__ void K_Compute()

  {

      use data;

  }

__global__ void K_Display()

{

  use data;

}

-------

Class 1 - 

Fill up data_h;

 cudaMemcpyToSymbol("data", &data_h, sizeof(data_h));

 K_Compute<<<grid,block>>>();

some more computation on CPU.

 Class2.Display();

---------

Class 2 - 

Display()

{

  K_Display<<<grid,block>>>();

}

Any and all help appreciated. Thank you!

Constant memory is never ‘cleaned’ or overwritten by the CUDA API unless you tell it to. If it messes up, you probably have some other bug in your kernel that overwrites memory…

I probably should have mentioned this in my earlier post. The error that I get when I try to access the constant is -
“Invalid argument”.

wumpus:
Thanks for your reply. I will double check that my first kernel does not do anything bad to the data.
But isn’t the constant memory out of bounds for a kernel to mess up with? I mean, the kernel cannot write anything to the constant memory, so how would I change it?
Also, I do not do anything to the constant memory anywhere other than the place where I have my cudaMemcpy().

But anyways, I will check again. Thanks!

After double checking I can assure you that I do not do any harm to the structure in my first kernel (nor anywhere else once the data is copied to the device)
I have not yet solved the problem.

Here is what my findings and my guesses :thumbsdown: are. I hope to turn this into something I know and not just guess.

The structure I am copying to has various integers and characters. I can access these without a problem in my second kernel too.
The element, accessing which crashes my system is a char pointer and is declared as a volatile char * bitmap_device;

In my second kernel, when I try to access data.bitmap_device[0], I crash.

Does that mean CUDA ‘cleans’ the pointer type variables only? Am I missing something?

If this is not the right way, what is?

If you’re working with structures you may have some problems with alignment of members inside it, so structure size on host and on device may be different.

Anyway, it is almost impossbile to say without looking at source code first.

You probably know this, but both kernels have to run from the same thread. cause for each thread cuda assigns a different memory space.

hello,tanmay.
i have a similar problem. i define an array (in global memory) in transformKernel, then use it as input to zoneKernel, but zoneKernel fails to get the correct data.
if you have solved your problem, please send a message to me, thanks :)

CODE

Kernel side -

global void transformKernel(unsigned char* data_out)
{
define data_out;
}

global void zoneKernel(unsigned char* data_in)
{
use data_in;
}


runTest()
{
transformKernel<<< dimGrid, dimBlock >>>( d_pResult);
CUDA_SAFE_CALL( cudaThreadSynchronize() );

CUDA_SAFE_CALL(cudaMemcpy(d_pRlt, d_pResult, BLOCKSIZE * sizeof(unsigned char), cudaMemcpyDeviceToDevice));

zoneKernel<<< zoneGrid, zoneBlock >>>( d_pRlt);
}

Any reply to this problem is welcome!

As far as I know, you cannot define your memory in your kernel, you have to define your memory outside of the kernel (and fill it in your kernel)

If you copy d_pResult to host memory do you see the right values then? My guess is, you don’t.

So do something like:

__global__ void transformKernel(unsigned char* data_out)

{

fill data_out;

}

__global__ void zoneKernel(unsigned char* data_in)

{

use data_in;

}

-------

runTest()

{

CudaMalloc (d_pResult,  BLOCKSIZE * sizeof(unsigned char));

transformKernel<<< dimGrid, dimBlock >>>( d_pResult);

CUDA_SAFE_CALL( cudaThreadSynchronize() );

CudaMalloc (d_pRlt,  BLOCKSIZE * sizeof(unsigned char));

CUDA_SAFE_CALL(cudaMemcpy(d_pRlt, d_pResult, BLOCKSIZE * sizeof(unsigned char), cudaMemcpyDeviceToDevice));

zoneKernel<<< zoneGrid, zoneBlock >>>( d_pRlt);

CudaFree(d_pResult);

CudaFree(d_pRlt);

}

deneis, In fact I did define the memory just follow the way you list, I think I had made a mistake in data filling rather than copying data between kernel.

After carefully rewrite my code, I find the data is right finally.

Thank you for your good advice. Have a happy Holiday :D

Never mind.

Sorry that I did not reply for about a month since I started this thread. Had been on vacation. B) ;)
Thank you to all those who have suggested things to try. I will let you know what happens.

icefiring - Did you solve the problem?