cudaMemset() problem

Has anyone else noticed that in the 0.8b package that cudaMemset() does nothing? For me it appears to leave the specified memory buffer on the device unchanged.

For example:

CUDA_SAFE_CALL( cudaMalloc( (void**) &DevOutBuf, 1024));
CUDA_SAFE_CALL( cudaMemset( (void *) DevOutBuf, 0, 1024));

does not set every byte to 0 as it should. If I fill the buffer with a pattern and then try to cudaMemset() it, the original bytes are not overwritten.

I have noticed this as well. I’m currently doing a Memcpy with a large buffer of zeroes as the input.

For the record I am using the recommended driver that came with the CUDA package (97.73).

I also noticed that cudaMemset() is OK in the emulator but fails on the device.

Yeah, I’m having this problem as well, so are others I know writing for CUDA :( Has anyone gotten this to work?

I just double checked; cudaMemset(…) and cudaMemset2D(…) both work fine on Linux, using the recommended 97.51 driver.

Is everyone who is having problems using Windows?

Same problem! cudamemset not doing anything. The weird part is it doesn’t always happen, but rather randomly… 4 years and nothing solved?

Works without a flaw for me. It’s either your code or your driver installation.

Try the following code

int* hbuff1, *hbuff2, *dbuff;

hbuff1=(int*) malloc(1024);

hbuff2=(int*) malloc(1024);

memset(hbuff1, 5, 1025);

cudaMalloc((void**) &dbuff, 1024);

cudaMemcpy((void *)dbuff, (void *)hbuff1,1024,cudaMemcpyHostToDevice);

cudaMemset((void *) dbuff, 0, 1024);

cudaMemcpy((void *)hbuff2, (void *)dbuff ,1024,cudaMemcpyDeviceToHost);

assert(memcmp(hbuff1,hbuff2,1024)!=0);

I think this is a bug in cudaMemset, but feel free to point out my error.
The following code yields a string of 16843009s instead of 1s on both Ubuntu 11.04 amd64 and Windows 7x64, latest dev drivers.

[font=“Courier New”]#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#define GRID_SIZE 128

int main (int argc , char* argv )
{
int* gridOccupancy = 0;

cudaMalloc((void**)&gridOccupancy,GRID_SIZE*GRID_SIZE*sizeof(int));
cudaMemset(gridOccupancy,1,GRID_SIZE*GRID_SIZE*sizeof(int));

int* hgridOccupancy = (int*) calloc(GRID_SIZE*GRID_SIZE,sizeof(int));
cudaMemcpy(hgridOccupancy,gridOccupancy,GRID_SIZE*GRID_SIZE*sizeof(int),cudaMemcpyDeviceToHost);

for (int m = 0; m < GRID_SIZE*GRID_SIZE; m++) {
	printf ("%d %d\n",m,hgridOccupancy[m]);
}

free(hgridOccupancy);

/* Cleanup */
cudaFree(gridOccupancy);

return EXIT_SUCCESS ;

}[/font]

Ok. There is no error. cudaMemset works likes memset - it sets bytes, not words. In your code sample, you are effectively setting each byte of memory to a value of 1, not each word. The results you are seeing are expected.