rhgong
March 24, 2010, 2:19am
1
The following code was taken from CUDA 2.3 programming guide. I found that it is the same in the latest released CUDA 3.0 programming guide.
To me, the code has an error: make_cudaExtent(64, 64, 64) should be make_cudaExtent(64 * sizeof(float), 64, 64)
Can anyone confirm whether I’m correct or not? Thanks!
// Host code
cudaPitchedPtr devPitchedPtr;
cudaExtent extent = make_cudaExtent(64, 64, 64);
cudaMalloc3D(&devPitchedPtr, extent);
MyKernel<<<100, 512>>>(devPitchedPtr, extent);
// Device code
global void MyKernel(cudaPitchedPtr devPitchedPtr, cudaExtent extent)
{
char* devPtr = devPitchedPtr.ptr;
size_t pitch = devPitchedPtr.pitch;
size_t slicePitch = pitch * extent.height;
for (int z = 0; z < extent.depth; ++z)
{
char* slice = devPtr + z * slicePitch;
for (int y = 0; y < extent.height; ++y)
{
float* row = (float*)(slice + y * pitch);
for (int x = 0; x < extent.width; ++x)
{
float element = row ;
.
.
.
The following code was taken from CUDA 2.3 programming guide. I found that it is the same in the latest released CUDA 3.0 programming guide.
To me, the code has an error: make_cudaExtent(64, 64, 64) should be make_cudaExtent(64 * sizeof(float), 64, 64)
Can anyone confirm whether I’m correct or not? Thanks!
You are correct. Thanks for catching this… we’ll correct it for the next version of the programming guide.
–Cliff
effepi
March 25, 2010, 10:50pm
3
The following code was taken from CUDA 2.3 programming guide. I found that it is the same in the latest released CUDA 3.0 programming guide.
To me, the code has an error: make_cudaExtent(64, 64, 64) should be make_cudaExtent(64 * sizeof(float), 64, 64)
Can anyone confirm whether I’m correct or not? Thanks!
// Host code
cudaPitchedPtr devPitchedPtr;
cudaExtent extent = make_cudaExtent(64 * sizeof(float), 64, 64);
…
…
…
float* row = (float*)(slice + y * pitch);
for (int x = 0; x < extent.width; ++x)
{
float element = row[x];
.
.
.
Here, with your change, I believe “extent.width” is equal to “64 * sizeof(float)”, so I’ll tend to use
for (int x = 0; x < extent.width / sizeof(float); ++x)
{
float element = row[x];
so that it runs the loop only 64 times.
Am I right?
Here, with your change, I believe “extent.width” is equal to “64 * sizeof(float)”, so I’ll tend to use
for (int x = 0; x < extent.width / sizeof(float); ++x)
{
float element = row[x];
so that it runs the loop only 64 times.
Oh, yes, also another good catch. Thanks again.
I completely concur see my post:
cudaMemcpy3D and cudaMalloc, cudaDeviceToDevice copies
Wish I would have seen this post first.
I completely concur see my post:
cudaMemcpy3D and cudaMalloc, cudaDeviceToDevice copies
Wish I would have seen this post first.