cudaMallocPitch() and cudaMemcpy2D()

I have a question about cudaMallocPitch() and cudaMemcpy2D().

float X_h; X_h = (float )malloc(NKsizeof(float));

where X_h[n*K+k] is the (n,k) element of X_h.

float X_d;
cudaMallocPitch((void **) &X_d, &pitch_x, width
sizeof(float), height);

cudaMemcpy2D(X_d, pitch_x, X_h, widthsizeof(float), widthsizeof(float), height, cudaMemcpyHostToDevice);

according to NVIDIA manual
((float )((char)X+pitch_xn) + k); accesses the nth row and kth column
why in my case I am accessing the kth row and the nth column? Is this a bug in Cuda 2.3?