Error in CUDA programming guide?

rhgong · March 24, 2010, 2:19am

The following code was taken from CUDA 2.3 programming guide. I found that it is the same in the latest released CUDA 3.0 programming guide.
To me, the code has an error: make_cudaExtent(64, 64, 64) should be make_cudaExtent(64 * sizeof(float), 64, 64)

Can anyone confirm whether I’m correct or not? Thanks!

// Host code
cudaPitchedPtr devPitchedPtr;
cudaExtent extent = make_cudaExtent(64, 64, 64);
cudaMalloc3D(&devPitchedPtr, extent);
MyKernel<<<100, 512>>>(devPitchedPtr, extent);

// Device code
global void MyKernel(cudaPitchedPtr devPitchedPtr, cudaExtent extent)
{
char* devPtr = devPitchedPtr.ptr;
size_t pitch = devPitchedPtr.pitch;
size_t slicePitch = pitch * extent.height;
for (int z = 0; z < extent.depth; ++z)
{
char* slice = devPtr + z * slicePitch;
for (int y = 0; y < extent.height; ++y)
{
float* row = (float*)(slice + y * pitch);
for (int x = 0; x < extent.width; ++x)
{
float element = row;
.
.
.

Cliff_Woolley · March 25, 2010, 5:38pm

You are correct. Thanks for catching this… we’ll correct it for the next version of the programming guide.

–Cliff

effepi · March 25, 2010, 10:50pm

The following code was taken from CUDA 2.3 programming guide. I found that it is the same in the latest released CUDA 3.0 programming guide.

To me, the code has an error: make_cudaExtent(64, 64, 64) should be make_cudaExtent(64 * sizeof(float), 64, 64)

Can anyone confirm whether I’m correct or not? Thanks!

// Host code

cudaPitchedPtr devPitchedPtr;

cudaExtent extent = make_cudaExtent(64 * sizeof(float), 64, 64);

…

…

…
  float* row = (float*)(slice + y * pitch); 

  for (int x = 0; x < extent.width; ++x) 

  {

    float element = row[x];
.

.

.

Here, with your change, I believe “extent.width” is equal to “64 * sizeof(float)”, so I’ll tend to use

for (int x = 0; x < extent.width / sizeof(float); ++x) 

	  {

		float element = row[x];

so that it runs the loop only 64 times.

Am I right?

Cliff_Woolley · March 25, 2010, 10:53pm

Here, with your change, I believe “extent.width” is equal to “64 * sizeof(float)”, so I’ll tend to use
for (int x = 0; x < extent.width / sizeof(float); ++x) 

	  {

		float element = row[x];
so that it runs the loop only 64 times.

Oh, yes, also another good catch. Thanks again.

bitminer · June 29, 2010, 4:50pm

I completely concur see my post:

cudaMemcpy3D and cudaMalloc, cudaDeviceToDevice copies

Wish I would have seen this post first.

bitminer · June 29, 2010, 4:50pm

I completely concur see my post:

cudaMemcpy3D and cudaMalloc, cudaDeviceToDevice copies

Wish I would have seen this post first.

Topic		Replies	Views
cudaMalloc3D (mistake in reference manual?) CUDA Programming and Performance	1	5317	April 20, 2010
cudaMemcpy3D and cudaMalloc cudaDeviceToDevice copies CUDA Programming and Performance	4	10548	June 29, 2010
cudaMallo3D - possibly incorrectly documented in Programming Guide 2.1 CUDA Programming and Performance	0	792	August 13, 2009
Usage of cudaMalloc3d() CUDA Programming and Performance	4	11979	April 5, 2010
Q 4 NVIDIA developers - CUDA 2.3 - make_cudaPitchedPtr broken? CUDA Programming and Performance	6	1607	April 26, 2010
cudaMalloc3D and friends proper use for whatever data type CUDA Programming and Performance	6	5923	July 14, 2010
Please Explain this code! CUDA Programming and Performance	1	3799	July 14, 2009
Setting up 3d arryas I have some questions about how to use 3d arrays and cudaArrays CUDA Programming and Performance	10	27893	April 5, 2010
error on devie2device copy CUDA Programming and Performance	1	1023	May 20, 2010
CUDA 4.0 cudaMemcpy3D invalid argument Error copying from Device to Host CUDA Programming and Performance	0	1561	August 14, 2011

Error in CUDA programming guide?

Related topics