cudaMalloc3D - What do I wrong? Do I make false assumptions ... ?

I have here a piece of code and it does not behave as I assume.

I assume that the memory I copy back to the host should be all initialized with zero, but I also get non zero values.

Either I have false assumption, I did something wrong or cudaMemset3D ist broken.

System is Windows XP x64. CUDA 2.3.

Can someone test and confirm this behavior / bug?

if it helps, you can see one example of 3D memory usage here: [url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtopic=157856[/url]

Thank you fcs for your hint.

The problem is however not the 3D memory usage itself, but the cudaMemset3D function. In my opinion it does not behave as it should. At least on Windows XP x64.

I used cudaMemset((void*)d_volume, 0, pitch_x * dim_y, dim_z) instead of cudaMemset3D and now it works all correctly.

Can someone test the code above?

It tried the same lines of code again and now it does not show the same problem as yesterday …

Yesterday I got uninitialized elements and today there are gone … very strange.

[Update]

Yet again the same thing … uninitialzed data … it seems it appears randomly.

The choice of 0 was bad, using something else 21 for example shows the error in every run.

I modified the code above.

I got a response from NVIDIA and this bug will be fixed in the next release of CUDA, probably 3.0 but I’m not sure what they mean with next release.

Have a look at my post…

[url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtopic=165400[/url]

Can we get a response from NVidia on this, I don’t believe it is fixed in 3.0 but I could be wrong? I too am a victim of this bug.

This bug is probably not fixed yet, I filled in a bug report long ago in the registered developers zone. The latest response was

There was no update of this bug yet, so I assume this fix is not available in CUDA 3.0.

If cudaMemset3D is the problem, You could consider writing that function as a kernel on your own? – If its really biting, you should bite it back…

The work around is to use either cudaMemset or cudaMemset2D. I notice that cudaMemset is quite a bit faster than cudaMemset2D.

The main issue with this bug isn’t that it would be hard to write a replacement but that you should be able to rely on basic functions. I assumed incorrectly that memset should be very stable and so I spent most of my time looking elsewhere for the problem. Now that I know it is a bug with cudaMemset3D it didn’t take more than 2 minutes to fix the issue which took a while to figure out.

Yes, Truly a better workaround… And Yes again, It must have been VERY difficult to locate the bug…

Yes, Truly a better workaround… And Yes again, It must have been VERY difficult to locate the bug…