cudaMallocPitch

Hello,
I have a few questions regarding cudaMallocPitch:

  1. I’m trying to allocate a 1024 by 1024 array of floats on the device. Can I do this, or is 1024 by 1024 too large? (this is 4MB right?)

  2. Assuming I can do this, and that cudaMallocPitch is the best way, Is there a way to access row r, column c, in a single line, without using float* row=

The programming guide shows this:

cudaMallocPitch((void**)&devPtr, &pitch, width * sizeof(float), height);
myKernel<<<100, 192>>>(devPtr, pitch);
// device code
global void myKernel(float* devPtr, int pitch)
{
for (int r = 0; r < height; ++r) {
float* row = (float*)((char*)devPtr + r * pitch);
for (int c = 0; c < width; ++c) {
float element = row[c];
}
}
}

Thanks in advance for your help,
Joe

Sorry I posted this in the wrong place, it should have been in the programming forum. Please delete thread.