3d Arrays 3d Array - Access Issue

I am trying to use simple 3D array for Halton number generation. The algorithm is 100% correct coz I tried the same without CUDA
So in the statement row[j] = h, the h values generates random numbers
But when I check it in host memory there is no data and all it prints is 0.
So my Issue is how to access it. I don’t have much idea about cudaMemcpy3D , so if possible correct the code rather than suggesting I use cudaMemcpy3D ;)

Or if any other mistake lemme know


int width =50,height=3,depth=10;

global void random3dGen(cudaPitchedPtr devPitchedPtr);
int main()
//host code
cudaExtent extent = make_cudaExtent(width * sizeof(float), height, depth);
cudaPitchedPtr devPitchedPtr;

char* devPtr = (char*)devPitchedPtr2.ptr;
size_t pitch = devPitchedPtr2.pitch;
size_t slicePitch = pitch * 3;
    char* slice = devPtr + 1 * slicePitch;
float* row =(float*)(slice + 2 * pitch);

for(int i=0;i<10;i++)

return 0;

global void random3dGen(cudaPitchedPtr devPitchedPtr)

long n1,i;
float h,ib;
float n0;
//Simple Halton Number Generator
float N = 16.0;//seeder - Animesh
int components = 3;
int primearray = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227,229};

char* devPtr =(char*)devPitchedPtr.ptr;
size_t pitch = devPitchedPtr.pitch;
size_t slicePitch = pitch * 3;

int b;

    char* slice = devPtr + (threadIdx.x % 10) * slicePitch;
for(int p=0;p<components;p++)
	float* row =(float*)(slice + p * pitch);
	n0 = N++;
	for(int j=0;j<50;j++)
		b = primearray[j];
		n0 = N;
		h = 0;
		ib = (float)(1.0/b);
			n1 = (int)(n0/b);
			i = n0 - n1*b;
			h = h + ib*i;
			ib = ib/b;
			n0 = n1;	
		row[j] = h;
	//	printf("%f",row[j]);
N = N + 2;


Do you expect the code you posted to actually do anything? If so, what might you expect the results to be?

It generates halton numbers as I mentioned in my post. Please read again.

I want to access the generated halton numbers in my host code.

I read your post and the code. It contains no kernel launch. Leaving aside the multitude of other problems with the code, how do you imagine the kernel will produce random numbers if it is never called by the host?

I skipped the kernel call while doing copy paste. anyway after the kernel call say


what are other multitude problems??

Easy to see you know nothing about nvidia cuda. Just here to increase your post count and post silly stupid mindless posts.

Thanks for playing.

How about adding some memory allocation for the device and the some memory transfers to and from host memory to device memory?

And pitched pointers are intended for allocating memory for two- and three dimensional arrays, which themselves are opaque structures which are intended only for manipulation via the texture APIs. The basic premise of your code suggests you should be (re)reading the CUDA documentation and looking at the SDK examples, because it is nonsense as written. And perhaps learn some manners at the same time.

I think the best approach is just as your code indicates… Just copy and paste your C-code, add a little CUDA specific extensions like “global” and your code will run beautifully on the GPU!

Now that they’ve released Fermi with full C++ support you don’t even need to think about the fact that the code is going to run on a GPU architecture. Just copy paste and compile, und voila! Acceleration! Quick and easy, without any hassles!

Yeah that’s how it works… Enjoying the free lunch so far?