Memory management in the device Is there any caching in device's memory?

dMaggot · September 4, 2008, 4:06am

Hello everyone,

I’m starting in NVIDIA’s CUDA platform and I started doing some tests. I’m basically testing a vector sum using the following code (only showing what I think are relevant parts):

global void vecAdd(float * A, float * B, float * C) {
int i = threadIdx.x;
C[i] = A[i]+B[i];
}

int main(int argc, char ** argv) {
cudaError_t error;
//Declare local stuff
float A = {2.0f, 3.0f, 4.0f, 5.0f, 6.0f};
float B = {2.0f, 4.0f, 6.0f, 8.0f, 10.0f};
float C[5];

//Move stuff to device
float * Ad, * Bd, * Cd;
cudaMalloc((void **)&Ad,5*sizeof(float));
cudaMemcpy(Ad, A, 5*sizeof(float), cudaMemcpyHostToDevice);
cudaMalloc((void **)&Bd,5*sizeof(float));
cudaMemcpy(Bd, B, 5*sizeof(float), cudaMemcpyHostToDevice);
cudaMalloc((void **)&Cd,5*sizeof(float));

//Do stuff
vecAdd<<<1,4>>>(Ad, Bd, Cd);

//Check for errors
error = cudaGetLastError();
if (error != cudaSuccess) {
	printf("Error: %s\n", cudaGetErrorString(error));
}

//Retrieve stuff
cudaMemcpy(C, Cd, 5*sizeof(float), cudaMemcpyDeviceToHost);

printf("C: {%.2f,%.2f,%.2f, %.2f, %.2f}\n",C[0], C[1], C[2], C[3], C[4]);

cudaFree(Ad);
cudaFree(Bd);
cudaFree(Cd);

return 0;

}

Code compiles fine, no error on execution neither but the strange behavior is as follows: If I modify the source decreasing the number of threads, I still get the same results on vector C. I noticed that was strange so after modifying the number of threads for values above and below 5 without any changes, I even changed the content of array B. Nothing changed neither. That makes me think of a previously allocated result in the device which I need to erase to have expected behavior, but I can’t find any of this theory in the documentation. Am I missing something? Some other interesting experiments are: I rebooted the computer and compiled the code with only one thread, and it threw me a reasonable result with nan in the all values but the first one; then I changed it to two threads, and also gave nan on all values but the first two ones; then I tried with 3, everything normal, but when I started decreasing the numbers, it wouldn’t change.

Additional system info which might be very helpful:

Nvidia Driver: 177.67 x86_64
CUDA Toolkit: 2.0 for Ubuntu 7.10 x86_64
System: Frugalware Linux x86_64
Device: GeForce 8800GT 256MB

Thanks for any help you might give!

David E. NarvÃ¡ez

theMarix · September 4, 2008, 11:14am

Due to the way the hardware is build: 8-way-SMP Multiprocessors that execute a batch of 32 threads in 4 clock cycles. Therefore there is a finite granularity to how many threads you can specify. I assume it’s multiples of 32 threads, but it might be 16. Actually I would have suspected your code to cause an “unspecified launch error”, as it will reference non-allocated memory. IMO you should always have some (template) parameter N and a check idx < N at the beginning of your code.

E.D_Riedijk · September 4, 2008, 11:26am

well, it is quite simple.

the memory allocated is the same each time. So when you have had 5 threads write 5 values, you have 5 values in memory.
If after that you have only 3 threads write values, the last 2 values remain the same. If you copy all 5 of them back to host, you will see the same values as before on the last 2 positions.

Topic		Replies	Views
Parallel processing of same memory address(es) CUDA Programming and Performance cuda	3	426	August 21, 2020
Number of threads in kernel doesn't work as expected strange behavior CUDA Programming and Performance	1	829	July 2, 2010
Why it doesnt work ? Simple program that adds two vectors CUDA Programming and Performance	6	3980	March 18, 2010
device global memory update questions CUDA Programming and Performance	7	5972	April 20, 2009
memory function does not see the memory, although the device array is copied CUDA Programming and Performance	2	2602	May 29, 2010
Cant modify data on the GPU CUDA Programming and Performance	16	10384	December 20, 2008
[Beginner] Memory is reseted in the kernel CUDA Programming and Performance	5	1392	October 29, 2010
Newbie Question: Threads What's going on here? CUDA Programming and Performance	5	2321	July 18, 2008
Performance Boost Not Really Seen CUDA Programming and Performance	8	1015	December 21, 2010
Memory problem? Thread divergence? CUDA Programming and Performance	4	1093	December 27, 2015

Memory management in the device Is there any caching in device's memory?

Related topics