I had some minor doubts…
a) Aren’t we allowed to use structures in device memory? That would include float4 vectors also, right?
B) Here a small snippet of code…Please tell me if I’m goin wrong…
[codebox]
#include <stdio.h>
device float4 *gpurowsf4;
global void cholesky()
{
//Trying to modify global memory - Seg faulting!!
gpurowsf4[0].x = 2634.0;
}
int main()
{
//Some random numbers
float4 *host,*gettingback;
host = new float4[3];
host[0]=make_float4(4,4.2,7,3.0);
host[1]=make_float4(4,4.2,9,6.0);
host[2]=make_float4(7,3.2,7,9.0);
cudaSetDevice(1);
cudaMalloc((void **)&gpurowsf4, 3*sizeof(float4));
cudaMemcpy(gpurowsf4,host,3*sizeof(float4),cudaMemcpyHostToD
evice);
cholesky<<<1,3>>>();
cudaMemcpy(gettingback,gpurowsf4,3*sizeof(float4),cudaMemcpy
DeviceToHost);
printf(“Check print %f”,gettingback[0].x);
}
[/codebox]
This code seg-faults obviously due to something that I’m doign wrong wrt float4…If I can really use float4, can I do something like this…Load an array of floats and then in the kernel read it up as float4! Would that work? Will it give a neat performance since my gpurowsf4 is about a million float (*4) and my problem doesn’t allow me to do much shared mem access…So its all going to be global mem!
Thanks,
Vandhan