device float4

I had some minor doubts…

a) Aren’t we allowed to use structures in device memory? That would include float4 vectors also, right?

B) Here a small snippet of code…Please tell me if I’m goin wrong…

[codebox]

#include <stdio.h>

device float4 *gpurowsf4;

global void cholesky()

{

//Trying to modify global memory - Seg faulting!!

gpurowsf4[0].x = 2634.0;

}

int main()

{

 //Some random numbers

    float4 *host,*gettingback;

    host = new float4[3];

    host[0]=make_float4(4,4.2,7,3.0);

    host[1]=make_float4(4,4.2,9,6.0);

    host[2]=make_float4(7,3.2,7,9.0);

cudaSetDevice(1);

    cudaMalloc((void **)&gpurowsf4, 3*sizeof(float4));

    cudaMemcpy(gpurowsf4,host,3*sizeof(float4),cudaMemcpyHostToD

evice);

cholesky<<<1,3>>>();

    cudaMemcpy(gettingback,gpurowsf4,3*sizeof(float4),cudaMemcpy

DeviceToHost);

printf(“Check print %f”,gettingback[0].x);

}

[/codebox]

This code seg-faults obviously due to something that I’m doign wrong wrt float4…If I can really use float4, can I do something like this…Load an array of floats and then in the kernel read it up as float4! Would that work? Will it give a neat performance since my gpurowsf4 is about a million float (*4) and my problem doesn’t allow me to do much shared mem access…So its all going to be global mem!

Thanks,

Vandhan

Yes, you are allowed to use structures in global memory. How are you initializing “gpurowsf4”? You need to use cudaMemcpyToSymbol to initialize global variables.

Otherwise it is usually simpler just to pass in pointers to your global function:

__global__ void cholesky(float4 *gpurowsf4)

{

gpurowsf4[0].x = 2634.0;

}

But doesn’t passing it as a function parameter put it into Shared memory? (I remember reading this somewhere)

Could you also please provide a small snippet of how to do what you have suggested?

Thanks!