At this moment i have around 60 args that i need to pass in the cuda kernel.
For that, i was coping a host pointer struct to the device pointer struct (element by element as shown in NVIDIA_CUDA_Programming_Guide_2.2.pdf).
But i am not able to access the pointer type struct elements inside the kernel by using (->) operator (as it gives me - Warning: Cannot tell what pointer points to, assuming global memory space).
Can anybody please suggest me better way to pass around 60 parameters inside the CUDA kernel?
Constant memory is another possibility. You can write constants and the addresses of devices pointers into constant memory before you launch the kernel, and then they are available to the kernel when it executes. There is constant memory cache and a broadcast mechanism, performance wise it should be little different to passing the same data by argument to the kernel.
Even though you get the warning, it should still work perfectly fine (provided the assumption about global memory is actually correct - which in this case it seems like it is).
I use a structure of dynamically allocated arrays for a couple of implementations and I also get one of those warnings anytime I access an array through the struct pointer, but the implementation works fine and I don’t appear to have any coalescing issues within the arrays themselves.
No that is not working, I am again getting garbage values inside kernel.
But one interesting thing - by mistake once i was copying a structure pointer (that i initialized in host and then set the elements in device, elements by elements like i did earlier) to a pointer that i allocated in device (like you were doing).
and surprisingly that was working but only when i was doing “cudaMemcpyHostToDevice” without any Memcopy error.
but again when i tried “cudaMemcpyDeviceToDevice” (which is the right way, i guess ** ) it was again not working.
Any Idea.
Please let me know if my English was not clear.
**
since when i allocate memory of all elements of any structure (elements by elements ) in Device, the entire structure has to be on the device (am i right ? ). So when i copy such struct to a struct define by your method it should be Device to Device type (i am not so clear with this point)
But In my case, i am getting garbage values, the moment when i try to access (->) the elements inside kernel. Allocation in global is not problem. As i get the element back to host memory if i dont go inside the kernel, but when it goes inside the kernel it is giving me garbage values.
testStruct *h_sn = new testStruct; // declaring host struct
testStruct *d_sn = new testStruct; // declaring device struct
setNull(h_sn); // initialising each elements to null
setNull(d_sn); // initialising each elements to null for device
//testStruct d_sn;
size_t sizeP = 4*sizeof(float); // size of array
initStruct(h_sn); // allocating memory for each elements in the host
MallocCUDDA(d_sn, sizeP); // allocating memory for each elements in the device
for(int i=0; i<4;i++)
{
h_sn->p00[i] = float(i+10) ; // passing same values to each host elements
h_sn->p50[i] = float(i*4) ;
}
cuddaMemCopy(d_sn, h_sn, sizeP); // copying each elements from host to device
goCuda(d_sn); // passing the device struct to cuda
free(h_sn->p00); free(h_sn->p50);
cudaFree(d_sn->p00);cudaFree(d_sn->p50);
/*h_sn.p00 = new float[4];
h_sn.p50 = new float[4];*/
/*
cudaMalloc((void**)&d_sn.p00, sizeP);
cudaMalloc((void**)&d_sn.p50, sizeP);
int arry[4][4];
arry[0][1] = 2222222;
float a = 20.0;
cudaMemcpy(d_sn.p00, h_sn.p00, sizeP,cudaMemcpyHostToDevice);
cudaMemcpy(d_sn.p50, h_sn.p50, sizeP,cudaMemcpyHostToDevice);
for(int i=0; i<4;i++)
cout << h_sn.p00[i] << endl;
goCuda(d_sn, arry);
when i try to copy d_sn which is an device struct pointer using cudaMemcpyHostToDevice it is working but if i use cudaMemcpyDeviceToDevice it is not working.
and if i use the h_sn pointer i get no values.
It is quite a tedious job but if you simply copy the codes in three files -
commonStruct.h - and put the struct there
main.cpp - and copy the main code there
cudaSolver.cu - and copy the cuda Code - It is definitely going to give testing platform in ur new VC++ project.
Please let me know if something is not understandable.
But anyway I want to thank you for all the help from your side.
I was also explaining something like this to myself, fortunately your point bolster my assumption.
Since allocation was done on host, the copy should be from host to device, OK.
But now coming back to your example, where your both host pointer struct and its elements are on host (which is h_sn in my case) is getting passed to the device when i do copy struct, no doubt.
But when i pass the pointer as argument in the kernel i am getting garbage values.
In short i mean copying a entire true (who truly resides on host) host struct to deivce is not working inside kernel.